Data

cars

active
ARFF
Publicly available Visibility: public Uploaded 28-09-2014 by Joaquin Vanschoren

0 likes downloaded by 3 people , 3 total downloads 0 issues 0 downvotes

0 likes downloaded by 3 people , 3 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author:
Source: Unknown - Date unknown
Please cite:
The Committee on Statistical Graphics of the American Statistical
Association (ASA) invites you to participate in its Second (1983)
Exposition of Statistical Graphics Technology. The purposes of the
Exposition are (l) to provide a forum in which users and providers of
statistical graphics technology can exchange information and ideas and
(2) to expose those members of the ASA community who are less familiar
with statistical graphics to its capabilities and potential benefits
to them. The Exposition wil1 be held in conjunction with the Annual
Meetings in Toronto, August 15-18, 1983 and is tentatively scheduled
for the afternoon of Wednesday, August 17.
Seven providers of statistical graphics technology participated in the
l982 Exposition. By all accounts, the Exposition was well received by
the ASA community and was a worthwhile experience for the
participants. We hope to have those seven involved again this year,
along with as many new participants as we can muster. The 1982
Exposition was summarized in a paper to appear in the Proceeding of
the Statistical Computing Section. A copy of that paper is enclosed
for your information.
The basic format of the 1983 Exposition will be similar to that of
1982. However, based upon comments received and experience gained,
there are some changes. The basic structure, intended to be simpler
and more flexible than last year, is as follows:
A fixed data set is to be analyzed. This data set is a version of the
CRCARS data set of
Donoho, David and Ramos, Ernesto (1982), ``PRIMDATA:
Data Sets for Use With PRIM-H'' (DRAFT).
Because of the Committee's limited (zero) budget for the Exposition,
we are forced to provide the data in hardcopy form only (enclosed).
(Sorry!)
There are 406 observations on the following 8 variables: MPG (miles
per gallon), # cylinders, engine displacement (cu. inches), horsepower,
vehicle weight (lbs.), time to accelerate from O to 60 mph (sec.),
model year (modulo 100), and origin of car (1. American, 2. European,
3. Japanese). These data appear on seven pages. Also provided are the
car labels (types) in the same order as the 8 variables on seven
separate pages. Missing data values are marked by series of question
marks.
You are asked to analyze these data using your statistical graphics
software. Your objective should be to achieve graphical displays which
will be meaningful to the viewers and highlight relevant aspects of
the data. If you can best achieve this using simple graphical formats,
fine. If you choose to illustrate some of the more sophisticated
capabilities of your software and can do so without losing relevancy
to the data, that is fine, too. This year, there will be no Committee
commentary on the individual presentations, so you are not competing
with other presenters. The role of each presenter is to do his/her
best job of presenting their statistical graphics technology to the
viewers.
Each participant will be provided with a 6'(long) by 4'(tall)
posterboard on which to display the results of their analyses. This is
the same format as last year. You are encouraged to remain by your
presentation during the Exposition to answer viewers' questions. Three
copies of your presentation must be submitted to me by July 1. Movie
or slide show presentations cannot be accommodated (sorry). The
Committee will prepare its own poster presentation which will orient
the viewers to the data and the purposes of the Exposition.
The ASA has asked us to remind all participants that the Exposition is
intended for educational and scientific purposes and is not a
marketing activity. Even though last year's participants did an
excellent job of maintaining that distinction, a cautionary note at
this point is appropriate.
Those of us who were involved with the 1982 Exposition found it
worthwhile and fun to do. We would very much like to have you
participate this year. For planning purposes, please RSVP (to me, in
writing please) by April 15 as to whether you plan to accept the
Committee's invitation.
If you have any questions about the Exposition, please call me on
(301/763-5350). If you have specific questions about the data, or the
analysis, please call Karen Kafadar on (301/921-3651). If you cannot
participate but know of another person or group in your organization
who can, please pass this invitation along to them.
Sincerely,
LAWRENCE H. COX
Statistical Research Division
Bureau of the Census
Room 3524-3
Washington, DC 20233
Information about the dataset
CLASSTYPE: nominal
CLASSINDEX: last

origin (target) | nominal | 3 unique values 0 missing | |

name (ignore) | nominal | 312 unique values 0 missing | |

mpg | numeric | 129 unique values 8 missing | |

cylinders | nominal | 5 unique values 0 missing | |

displacement | numeric | 83 unique values 0 missing | |

horsepower | numeric | 93 unique values 6 missing | |

weight | numeric | 356 unique values 0 missing | |

acceleration | numeric | 96 unique values 0 missing | |

model.year | numeric | 13 unique values 0 missing |

3

The minimal number of distinct values among attributes of the nominal type.

23.29

Second quartile (Median) of standard deviation of attributes of the numeric type.

0.85

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

0.39

Maximum mutual information between the nominal attributes and the target attribute.

0.19

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

3.43

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

5

The maximum number of distinct values among attributes of the nominal type.

0.42

Third quartile of kurtosis among attributes of the numeric type.

0.63

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

0.86

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.85

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

0.39

Third quartile of mutual information between the nominal attributes and the target attribute.

0.24

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.19

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

0.62

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

0.86

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

0.78

Third quartile of skewness among attributes of the numeric type.

0.55

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.63

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

-0.92

First quartile of kurtosis among attributes of the numeric type.

290.44

Third quartile of standard deviation of attributes of the numeric type.

0.86

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.85

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

0.24

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.19

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

0.62

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

0.39

Average mutual information between the nominal attributes and the target attribute.

0.39

First quartile of mutual information between the nominal attributes and the target attribute.

0.55

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.63

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

3.11

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

4

Average number of distinct values among the attributes of the nominal type.

0.18

First quartile of skewness among attributes of the numeric type.

0.51

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

0.86

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

1.41

Standard deviation of the number of distinct values among attributes of the nominal type.

3.51

First quartile of standard deviation of attributes of the numeric type.

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

0.24

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.55

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

-0.66

Second quartile (Median) of kurtosis among attributes of the numeric type.

0.51

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

90.5

Second quartile (Median) of means among attributes of the numeric type.

0.87

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

0.85

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

0.39

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

0.34

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

0.39

Minimal mutual information between the nominal attributes and the target attribute.

0.48

Second quartile (Median) of skewness among attributes of the numeric type.

0.51

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

0.43

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump