Data

auto93

active
ARFF
Publicly available Visibility: public Uploaded 03-10-2014 by Joaquin Vanschoren

0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author:
Source: Unknown - Date unknown
Please cite:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Attributes 2,4, and 6 deleted. Midrange price treated as the class
attribute.
As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction
using instance-based learning with encoding length selection. In Progress
in Connectionist-Based Information Systems. Singapore: Springer-Verlag.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NAME: 1993 New Car Data
TYPE: Sample
SIZE: 93 observations, 26 variables
DESCRIPTIVE ABSTRACT:
Specifications are given for 93 new car models for the 1993 year.
Several measures are given to evaluate price, mpg ratings, engine size,
body size, and features.
SOURCES:
_Consumer Reports: The 1993 Cars - Annual Auto Issue_ (April 1993),
Yonkers, NY: Consumers Union.
_PACE New Car & Truck 1993 Buying Guide_ (1993), Milwaukee, WI: Pace
Publications Inc.
VARIABLE DESCRIPTIONS:
Line 1
Columns
1 - 14 Manufacturer
15 - 29 Model
30 - 36 Type
Small, Sporty, Compact, Midsize, Large - as defined in the
_Consumer Reports_ article
38 - 41 Minimum Price (in $1,000) - Price for basic version of this model
43 - 46 Midrange Price (in $1,000) - Average of Min and Max prices
48 - 51 Maximum Price (in $1,000) - Price for a premium version
53 - 54 City MPG (miles per gallon by EPA rating)
56 - 57 Highway MPG
59 - 59 Air Bags standard
0 = none, 1 = driver only, 2 = driver & passenger
61 - 61 Drive train type
0 = rear wheel drive
1 = front wheel drive
2 = all wheel drive
63 - 63 Number of cylinders
65 - 67 Engine size (liters)
69 - 71 Horsepower (maximum)
73 - 76 RPM (revs per minute at maximum horsepower)
Line 2
Columns
1 - 4 Engine revolutions per mile (in highest gear)
6 - 6 Manual transmission available
0 = No, 1 = Yes
8 - 11 Fuel tank capacity (gallons)
13 - 13 Passenger capacity (persons)
15 - 17 Length (inches)
19 - 21 Wheelbase (inches)
23 - 24 Width (inches)
26 - 27 U-turn space (feet)
29 - 32 Rear seat room (inches)
34 - 35 Luggage capacity (cu. ft.)
37 - 40 Weight (pounds)
42 - 42 Domestic?
0 = non-U.S. manufacturer, 1 = U.S. manufacturer
Values are aligned and delimited by blanks.
Missing values are denoted with *.
There are two data lines for each case.
SPECIAL NOTES:
The only missing values are for CYLINDERS in the rotary engine Mazda
RX-7, REAR SEAT room for the two-seaters (Corvette and RX-7), and
LUGGAGE capacity for the vans and two-seaters.
WEIGHT is taken from the _Consumer Reports_ data and includes a full
fuel tank, automatic transmission (if available), and air conditioning.
STORY BEHIND THE DATA:
Cars were selected at random from among 1993 passenger car models that
were listed in both the _Consumer Reports_ issue and the _PACE Buying
Guide_. Pickup trucks and Sport/Utility vehicles were eliminated due
to incomplete information in the _Consumer Reports_ source. Duplicate
models (e.g., Dodge Shadow and Plymouth Sundance) were listed at most
once.
A similar dataset for 1989 model cars appeared as one of the sample
datasets shipped with the _Student Edition of Execustat_ (PWS-KENT
1990).
Further description can be found in the "Datasets and Stories" article
"1993 New Car Data" in the _Journal of Statistics Education_ (Lock 1993).
Send the message
send jse/v1n1/datasets.lock
to the address archive@jse.stat.ncsu.edu
PEDAGOGICAL NOTES:
This is a multi-purpose dataset that can be used at many points in an
introductory course. It includes many good numeric variables and
several options for dividing the cars up into groups. Students tend to
be familiar with most of the variables (and specific car models). They
can anticipate and pose explanations for many of the relationships to
be found in the data, although some surprises may be encountered. One
can easily find examples of pairs of variables that demonstrate strong
or weak, positive or negative associations. PRICE and MPG variables
tend to be popular choices as "dependent" variables. Basic graphs will
often reveal unusual data values (like the price for a Mercedes-Benz).
REFERENCES:
Lock, R. H. (1993), "1993 New Car Data," _Journal of Statistics
Education_, 1, No. 1.
_Student Edition of Execustat_ (1990), Boston, MA: PWS-KENT
Publishing Co.
SUBMITTED BY:
Robin H. Lock
Mathematics Department
St. Lawrence University
Canton, NY 13617
(315) 379-5960
rlock@stlawu.bitnet

class (target) | numeric | 81 unique values 0 missing | |

Manufacturer | nominal | 31 unique values 0 missing | |

Type | nominal | 6 unique values 0 missing | |

City_MPG | numeric | 21 unique values 0 missing | |

Highway_MPG | numeric | 22 unique values 0 missing | |

Air_Bags_standard | nominal | 3 unique values 0 missing | |

Drive_train_type | nominal | 3 unique values 0 missing | |

Number_of_cylinders | numeric | 5 unique values 1 missing | |

Engine_size | numeric | 26 unique values 0 missing | |

Horsepower | numeric | 57 unique values 0 missing | |

RPM | numeric | 24 unique values 0 missing | |

Engine_revolutions_per_mile | numeric | 78 unique values 0 missing | |

Manual_transmission_available | nominal | 2 unique values 0 missing | |

Fuel_tank_capacity | numeric | 38 unique values 0 missing | |

Passenger_capacity | numeric | 6 unique values 0 missing | |

Length | numeric | 51 unique values 0 missing | |

Wheelbase | numeric | 27 unique values 0 missing | |

Width | numeric | 16 unique values 0 missing | |

U-turn_space | numeric | 14 unique values 0 missing | |

Rear_seat_room | numeric | 24 unique values 2 missing | |

Luggage_capacity | numeric | 16 unique values 11 missing | |

Weight | numeric | 81 unique values 0 missing | |

Domestic | nominal | 2 unique values 0 missing |

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Third quartile of mutual information between the nominal attributes and the target attribute.

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

0.91

Third quartile of skewness among attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

-0.33

First quartile of kurtosis among attributes of the numeric type.

33.49

Third quartile of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

Average mutual information between the nominal attributes and the target attribute.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

First quartile of mutual information between the nominal attributes and the target attribute.

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

11.44

Standard deviation of the number of distinct values among attributes of the nominal type.

7.83

Average number of distinct values among the attributes of the nominal type.

-0.01

First quartile of skewness among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

2.99

First quartile of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.38

Second quartile (Median) of kurtosis among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

29.09

Second quartile (Median) of means among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Minimal mutual information between the nominal attributes and the target attribute.

0.23

Second quartile (Median) of skewness among attributes of the numeric type.

Maximum mutual information between the nominal attributes and the target attribute.

2

The minimal number of distinct values among attributes of the nominal type.

5.33

Second quartile (Median) of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

31

The maximum number of distinct values among attributes of the nominal type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

1.02

Third quartile of kurtosis among attributes of the numeric type.