Data

house_8L

active
ARFF
Publicly available Visibility: public Uploaded 23-04-2014 by Jan van Rijn

1 likes downloaded by 3 people , 3 total downloads 0 issues 0 downvotes

1 likes downloaded by 3 people , 3 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author:
Source: Unknown -
Please cite:
This database was designed on the basis of data provided by US Census
Bureau [http://www.census.gov] (under Lookup Access
[http://www.census.gov/cdrom/lookup]: Summary Tape File 1). The data
were collected as part of the 1990 US census. These are mostly counts
cumulated at different survey levels. For the purpose of this data set
a level State-Place was used. Data from all states was obtained. Most
of the counts were changed into appropriate proportions. There are 4
different data sets obtained from this database: House(8H) House(8L)
House(16H) House(16L) These are all concerned with predicting the
median price of the house in the region based on demographic
composition and a state of housing market in the region. A number in
the name signifies the number of attributes of the data set. A
following letter denotes a very rough approximation to the difficulty
of the task. For Low task difficulty, more correlated attributes were
chosen as signified by univariate smooth fit of that input on the
target. Tasks with High difficulty have had their attributes chosen to
make the modelling more difficult due to higher variance or lower
correlation of the inputs to the target.
Original source: DELVE repository of data.
Source: collection of regression datasets by Luis Torgo (ltorgo@ncc.up.pt) at
http://www.ncc.up.pt/~ltorgo/Regression/DataSets.html
Characteristics: 22784 cases, 9 continuous attributes.

price (target) | numeric | 2045 unique values 0 missing | |

P3 | numeric | 5818 unique values 0 missing | |

P6p4 | numeric | 12051 unique values 0 missing | |

P11p3 | numeric | 18765 unique values 0 missing | |

P16p2 | numeric | 15570 unique values 0 missing | |

P19p2 | numeric | 10941 unique values 0 missing | |

H5p2 | numeric | 6002 unique values 0 missing | |

H15p1 | numeric | 18585 unique values 0 missing | |

H40p4 | numeric | 2421 unique values 0 missing |

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

First quartile of mutual information between the nominal attributes and the target attribute.

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

Average number of distinct values among the attributes of the nominal type.

-0.2

First quartile of skewness among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Standard deviation of the number of distinct values among attributes of the nominal type.

0.07

First quartile of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

5.91

Second quartile (Median) of kurtosis among attributes of the numeric type.

0.49

Second quartile (Median) of means among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Minimal mutual information between the nominal attributes and the target attribute.

1.55

Second quartile (Median) of skewness among attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

The minimal number of distinct values among attributes of the nominal type.

0.14

Second quartile (Median) of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Maximum mutual information between the nominal attributes and the target attribute.

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

The maximum number of distinct values among attributes of the nominal type.

117.7

Third quartile of kurtosis among attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Third quartile of mutual information between the nominal attributes and the target attribute.

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

8.52

Third quartile of skewness among attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

2.38

First quartile of kurtosis among attributes of the numeric type.

12475.32

Third quartile of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

Average mutual information between the nominal attributes and the target attribute.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3