Data
cars

cars

active ARFF Publicly available Visibility: public Uploaded 28-09-2014 by Joaquin Vanschoren
0 likes downloaded by 3 people , 3 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Date unknown Please cite: The Committee on Statistical Graphics of the American Statistical Association (ASA) invites you to participate in its Second (1983) Exposition of Statistical Graphics Technology. The purposes of the Exposition are (l) to provide a forum in which users and providers of statistical graphics technology can exchange information and ideas and (2) to expose those members of the ASA community who are less familiar with statistical graphics to its capabilities and potential benefits to them. The Exposition wil1 be held in conjunction with the Annual Meetings in Toronto, August 15-18, 1983 and is tentatively scheduled for the afternoon of Wednesday, August 17. Seven providers of statistical graphics technology participated in the l982 Exposition. By all accounts, the Exposition was well received by the ASA community and was a worthwhile experience for the participants. We hope to have those seven involved again this year, along with as many new participants as we can muster. The 1982 Exposition was summarized in a paper to appear in the Proceeding of the Statistical Computing Section. A copy of that paper is enclosed for your information. The basic format of the 1983 Exposition will be similar to that of 1982. However, based upon comments received and experience gained, there are some changes. The basic structure, intended to be simpler and more flexible than last year, is as follows: A fixed data set is to be analyzed. This data set is a version of the CRCARS data set of Donoho, David and Ramos, Ernesto (1982), ``PRIMDATA: Data Sets for Use With PRIM-H'' (DRAFT). Because of the Committee's limited (zero) budget for the Exposition, we are forced to provide the data in hardcopy form only (enclosed). (Sorry!) There are 406 observations on the following 8 variables: MPG (miles per gallon), # cylinders, engine displacement (cu. inches), horsepower, vehicle weight (lbs.), time to accelerate from O to 60 mph (sec.), model year (modulo 100), and origin of car (1. American, 2. European, 3. Japanese). These data appear on seven pages. Also provided are the car labels (types) in the same order as the 8 variables on seven separate pages. Missing data values are marked by series of question marks. You are asked to analyze these data using your statistical graphics software. Your objective should be to achieve graphical displays which will be meaningful to the viewers and highlight relevant aspects of the data. If you can best achieve this using simple graphical formats, fine. If you choose to illustrate some of the more sophisticated capabilities of your software and can do so without losing relevancy to the data, that is fine, too. This year, there will be no Committee commentary on the individual presentations, so you are not competing with other presenters. The role of each presenter is to do his/her best job of presenting their statistical graphics technology to the viewers. Each participant will be provided with a 6'(long) by 4'(tall) posterboard on which to display the results of their analyses. This is the same format as last year. You are encouraged to remain by your presentation during the Exposition to answer viewers' questions. Three copies of your presentation must be submitted to me by July 1. Movie or slide show presentations cannot be accommodated (sorry). The Committee will prepare its own poster presentation which will orient the viewers to the data and the purposes of the Exposition. The ASA has asked us to remind all participants that the Exposition is intended for educational and scientific purposes and is not a marketing activity. Even though last year's participants did an excellent job of maintaining that distinction, a cautionary note at this point is appropriate. Those of us who were involved with the 1982 Exposition found it worthwhile and fun to do. We would very much like to have you participate this year. For planning purposes, please RSVP (to me, in writing please) by April 15 as to whether you plan to accept the Committee's invitation. If you have any questions about the Exposition, please call me on (301/763-5350). If you have specific questions about the data, or the analysis, please call Karen Kafadar on (301/921-3651). If you cannot participate but know of another person or group in your organization who can, please pass this invitation along to them. Sincerely, LAWRENCE H. COX Statistical Research Division Bureau of the Census Room 3524-3 Washington, DC 20233 Information about the dataset CLASSTYPE: nominal CLASSINDEX: last

9 features

origin (target)nominal3 unique values
0 missing
name (ignore)nominal312 unique values
0 missing
mpgnumeric129 unique values
8 missing
cylindersnominal5 unique values
0 missing
displacementnumeric83 unique values
0 missing
horsepowernumeric93 unique values
6 missing
weightnumeric356 unique values
0 missing
accelerationnumeric96 unique values
0 missing
model.yearnumeric13 unique values
0 missing

107 properties

406
Number of instances (rows) of the dataset.
9
Number of attributes (columns) of the dataset.
3
Number of distinct values of the target attribute (if it is nominal).
14
Number of missing values in the dataset.
14
Number of instances with at least one value missing.
6
Number of numeric attributes.
3
Number of nominal attributes.
3.11
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
0
Number of binary attributes.
0.39
First quartile of mutual information between the nominal attributes and the target attribute.
0.28
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.55
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.63
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
4
Average number of distinct values among the attributes of the nominal type.
0.18
First quartile of skewness among attributes of the numeric type.
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
1.41
Standard deviation of the number of distinct values among attributes of the nominal type.
0.2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.49
Mean skewness among attributes of the numeric type.
3.51
First quartile of standard deviation of attributes of the numeric type.
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.24
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.76
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.62
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
167.51
Mean standard deviation of attributes of the numeric type.
1.59
Second quartile (Median) of entropy among attributes.
0.28
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.55
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.28
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
62.56
Percentage of instances belonging to the most frequent class.
1.59
Minimal entropy among attributes.
-0.66
Second quartile (Median) of kurtosis among attributes of the numeric type.
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
1.33
Entropy of the target attribute values.
0.48
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
254
Number of instances belonging to the most frequent class.
-1.2
Minimum kurtosis among attributes of the numeric type.
90.5
Second quartile (Median) of means among attributes of the numeric type.
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.85
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
1.59
Maximum entropy among attributes.
15.52
Minimum of means among attributes of the numeric type.
0.39
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.28
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.34
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
0.54
Maximum kurtosis among attributes of the numeric type.
0.39
Minimal mutual information between the nominal attributes and the target attribute.
0.48
Second quartile (Median) of skewness among attributes of the numeric type.
0.51
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.43
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
2979.41
Maximum of means among attributes of the numeric type.
3
The minimal number of distinct values among attributes of the nominal type.
0
Percentage of binary attributes.
23.29
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.85
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.02
Number of attributes divided by the number of instances.
0.39
Maximum mutual information between the nominal attributes and the target attribute.
0.02
Minimum skewness among attributes of the numeric type.
3.45
Percentage of instances having missing values.
1.59
Third quartile of entropy among attributes.
0.19
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
3.43
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
5
The maximum number of distinct values among attributes of the nominal type.
2.8
Minimum standard deviation of attributes of the numeric type.
0.38
Percentage of missing values.
0.42
Third quartile of kurtosis among attributes of the numeric type.
0.62
Average class difference between consecutive instances.
0.63
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
1.03
Maximum skewness among attributes of the numeric type.
17.98
Percentage of instances belonging to the least frequent class.
66.67
Percentage of numeric attributes.
890.94
Third quartile of means among attributes of the numeric type.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.85
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
847
Maximum standard deviation of attributes of the numeric type.
73
Number of instances belonging to the least frequent class.
33.33
Percentage of nominal attributes.
0.39
Third quartile of mutual information between the nominal attributes and the target attribute.
0.24
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.19
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.62
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
1.59
Average entropy of the attributes.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
1.59
First quartile of entropy among attributes.
0.78
Third quartile of skewness among attributes of the numeric type.
0.55
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.63
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
-0.4
Mean kurtosis among attributes of the numeric type.
0.33
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
-0.92
First quartile of kurtosis among attributes of the numeric type.
290.44
Third quartile of standard deviation of attributes of the numeric type.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.85
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
565.71
Mean of means among attributes of the numeric type.
0.39
Average mutual information between the nominal attributes and the target attribute.
0.44
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
21.52
First quartile of means among attributes of the numeric type.
0.87
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.24
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.19
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.62
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

5 tasks

51 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: origin
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: origin
0 runs - estimation_procedure: Interleaved Test then Train - target_feature: origin
Define a new task