Data
SMSA

SMSA

active ARFF Publicly available Visibility: public Uploaded 07-10-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Date unknown Please cite: Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file http://lib.stat.cmu.edu/DASL/Datafiles/SMSA.html Air Pollution and Mortality Reference: U.S. Department of Labor Statistics Authorization: free use Description: Properties of 60 Standard Metropolitan Statistical Areas (a standard Census Bureau designation of the region around a city) in the United States, collected from a variety of sources. The data include information on the social and economic conditions in these areas, on their climate, and some indices of air pollution potentials. Number of cases: 60 Variable Names: city: City name JanTemp: Mean January temperature (degrees Farenheit) JulyTemp: Mean July temperature (degrees Farenheit) RelHum: Relative Humidity Rain: Annual rainfall (inches) Mortality: Age adjusted mortality Education: Median education PopDensity: Population density %NonWhite: Percentage of non whites %WC: Percentage of white collar workers pop: Population pop/house: Population per household income: Median income HCPot: HC pollution potential NOxPot: Nitrous Oxide pollution potential SO2Pot: Sulfur Dioxide pollution potential NOx: Nitrous Oxide

16 features

NOx (target)numeric30 unique values
0 missing
city (ignore)nominal59 unique values
0 missing
JanTempnumeric28 unique values
0 missing
JulyTempnumeric20 unique values
0 missing
RelHumnumeric17 unique values
0 missing
Rainnumeric32 unique values
0 missing
Mortalitynumeric59 unique values
0 missing
Educationnumeric26 unique values
0 missing
PopDensitynumeric59 unique values
0 missing
%NonWhitenumeric53 unique values
0 missing
%WCnumeric50 unique values
0 missing
popnumeric59 unique values
0 missing
pop/housenumeric34 unique values
0 missing
incomenumeric59 unique values
0 missing
HCPotnumeric34 unique values
0 missing
NOxPotnumeric30 unique values
0 missing
S02Potnumeric44 unique values
0 missing

19 properties

59
Number of instances (rows) of the dataset.
16
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
16
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
-25.07
Average class difference between consecutive instances.
100
Percentage of numeric attributes.
0.27
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

13 tasks

2 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: NOx
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: NOx
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task