Data

datatrieve

active
ARFF
Publicly available Visibility: public Uploaded 06-10-2014 by Joaquin Vanschoren

0 likes downloaded by 9 people , 10 total downloads 0 issues 0 downvotes

0 likes downloaded by 9 people , 10 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author:
Source: Unknown - Date unknown
Please cite:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
This is a PROMISE Software Engineering Repository data set made publicly
available in order to encourage repeatable, verifiable, refutable, and/or
improvable predictive models of software engineering.
If you publish material based on PROMISE data sets then, please
follow the acknowledgment guidelines posted on the PROMISE repository
web page http://promise.site.uottawa.ca/SERepository .
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1. Title/Topic: The transition of the DATATRIEVE product from version 6.0 to
version 6.1
2. Sources:
-- Creators: DATATRIEVETM project carried out at Digital Engineering Italy
-- Donor: Guenther Ruhe
-- Date: January 15, 2005
3. Past usage:
A hybrid approach to analyze empirical software engineering data
and its application to predict module fault-proneness in maintenance
Source Journal of Systems and Software archive
Volume 53 , Issue 3 (September 2000) table of contents
Pages: 225 - 237
Year of Publication: 2000
ISSN:0164-1212
Authors
Sandro Morasca
Gunther Ruhe
4. Relevant information:
The DATATRIEVE product was undergoing both adaptive (DATATRIEVE was being transferred
from platform OpenVMS/VAX to platform OpenVMS/Alpha) and corrective maintenance
(failures reported from customers were being fixed) at the Gallarate (Italy)
site of Digital Engineering.
The DATATRIEVE product was originally developed in the BLISS language. BLISS is an
expression language. It is block-structured, with exception handling facilities, coroutines,
and a macro system. It was one of the first non-assembly languages for operating system
implementation.. Some parts were later added or rewritten in the C language. Therefore, the
overall structure of DATATRIEVE is composed of C functions and BLISS subroutines.
The empirical study of this data set reports only the BLISS part, by far the bigger one.
In what follows, we will use the term "module" to refer to a BLISS module, i.e., a set of
declarations and subroutines usually belonging to one file. More than 100 BLISS modules
have been studied. It was important to the DATATRIEVE team to better understand how the
characteristics of the modules and transition process were correlated with the code quality.
The objective of the data analysis was to study whether it was possible to classify modules as
non-faulty or faulty, based on a set of measures collected on the project.
5. Number of records: 130
6. Number of attributes: 9
8 condition attributes
1 decision attribute
7. Attribute Information:
1. LOC6_0: number of lines of code of module m in version 6.0.
2. LOC6_1: number of lines of code of module m in version 6.1.
3. AddedLOC: number of lines of code that were added to module m in version 6.1, i.e., they
were not present in module m in version 6.0.
4. DeletedLOC: number of lines of code that were deleted from module m in version 6.0, i.e.,
they were no longer present in module m in version 6.1.
5. DifferentBlocks: number of different blocks module m in between versions 6.0 and 6.1.
6. ModificationRate: rate of modification of module m, i.e.,
(AddedLOC + DeletedLOC) / (LOC6.0 + AddedLOC).
7. ModuleKnowledge: subjective variable that expresses the project team's knowledge on
module m (low or high)
8. ReusedLOC: number of lines of code of module m in version 6.0 reused in module m in
version 6.1.
9. Faulty6_1: its value is 0 for all those modules in which no faults were found;
its value is 1 for all other modules.
8. Missing attributes: none
9. Class Distribution:
0: 119 = 91.54%
1: 11 = 8.46%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Faulty6_1 (target) | nominal | 2 unique values 0 missing | |

LOC6_0 | numeric | 125 unique values 0 missing | |

LOC6_1 | numeric | 123 unique values 0 missing | |

Added_LoC | numeric | 103 unique values 0 missing | |

Del_LoC | numeric | 98 unique values 0 missing | |

Diff_Block | numeric | 58 unique values 0 missing | |

Mod_Rate | numeric | 47 unique values 0 missing | |

Mod_Know | numeric | 2 unique values 0 missing | |

ReusedLoC | numeric | 122 unique values 0 missing |

0.52

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

0.08

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

15.33

First quartile of standard deviation of attributes of the numeric type.

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

-0.01

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

3.67

Second quartile (Median) of kurtosis among attributes of the numeric type.

0.52

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

0.65

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

114.05

Second quartile (Median) of means among attributes of the numeric type.

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

0.09

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

1.71

Second quartile (Median) of skewness among attributes of the numeric type.

-0.01

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

-0.01

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Minimal mutual information between the nominal attributes and the target attribute.

114.79

Second quartile (Median) of standard deviation of attributes of the numeric type.

0.64

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Maximum mutual information between the nominal attributes and the target attribute.

2

The minimal number of distinct values among attributes of the nominal type.

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

2

The maximum number of distinct values among attributes of the nominal type.

6.18

Third quartile of kurtosis among attributes of the numeric type.

0.25

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.64

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Third quartile of mutual information between the nominal attributes and the target attribute.

0.08

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

1.94

Third quartile of skewness among attributes of the numeric type.

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.25

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

815.84

Third quartile of standard deviation of attributes of the numeric type.

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.64

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

1.57

First quartile of kurtosis among attributes of the numeric type.

0.52

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

0.08

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

Average mutual information between the nominal attributes and the target attribute.

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.25

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

First quartile of mutual information between the nominal attributes and the target attribute.

-0.01

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0

Standard deviation of the number of distinct values among attributes of the nominal type.

2

Average number of distinct values among the attributes of the nominal type.

1.36

First quartile of skewness among attributes of the numeric type.