Data
us_crime

us_crime

active ARFF Publicly available Visibility: public Uploaded 25-08-2014 by Tobias Kuehn
1 likes downloaded by 2 people , 3 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - 2009 Please cite: Title: Communities and Crime Abstract: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. Data Set Characteristics: Multivariate Attribute Characteristics: Real Associated Tasks: Regression Number of Instances: 1994 Number of Attributes: 128 Missing Values? Yes Area: Social Date Donated: 2009-07-13 Source: Creator: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- culled from 1990 US Census, 1995 US FBI Uniform Crime Report, 1990 US Law Enforcement Management and Administrative Statistics Survey, available from ICPSR at U of Michigan. -- Donor: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- Date: July 2009 Data Set Information: Many variables are included so that algorithms that select or learn weights for attributes could be tested. However, clearly unrelated attributes were not included; attributes were picked if there was any plausible connection to crime (N=122), plus the attribute to be predicted (Per Capita Violent Crimes). The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. There was apparently some controversy in some states concerning the counting of rapes. These resulted in missing values for rape, which resulted in incorrect values for per capita violent crime. These cities are not included in the dataset. Many of these omitted communities were from the midwestern USA. Data is described below based on original values. All numeric data was normalized into the decimal range 0.00-1.00 using an Unsupervised, equal-interval binning method. Attributes retain their distribution and skew (hence for example the population attribute has a mean value of 0.06 because most communities are small). E.g. An attribute described as 'mean people per household' is actually the normalized (0-1) version of that value. The normalization preserves rough ratios of values WITHIN an attribute (e.g. double the value for double the population within the available precision - except for extreme values (all values more than 3 SD above the mean are normalized to 1.00; all values more than 3 SD below the mean are nromalized to 0.00)). However, the normalization does not preserve relationships between values BETWEEN attributes (e.g. it would not be meaningful to compare the value for whitePerCap with the value for blackPerCap for a community) A limitation was that the LEMAS survey was of the police departments with at least 100 officers, plus a random sample of smaller departments. For our purposes, communities not found in both census and crime datasets were omitted. Many communities are missing LEMAS data.

128 features

ViolentCrimesPerPop (target)numeric98 unique values
0 missing
statenumeric46 unique values
0 missing
countynumeric108 unique values
1174 missing
communitynumeric799 unique values
1177 missing
communitynamestring1828 unique values
0 missing
foldnumeric10 unique values
0 missing
populationnumeric66 unique values
0 missing
householdsizenumeric93 unique values
0 missing
racepctblacknumeric100 unique values
0 missing
racePctWhitenumeric99 unique values
0 missing
racePctAsiannumeric91 unique values
0 missing
racePctHispnumeric91 unique values
0 missing
agePct12t21numeric93 unique values
0 missing
agePct12t29numeric89 unique values
0 missing
agePct16t24numeric94 unique values
0 missing
agePct65upnumeric98 unique values
0 missing
numbUrbannumeric67 unique values
0 missing
pctUrbannumeric64 unique values
0 missing
medIncomenumeric99 unique values
0 missing
pctWWagenumeric96 unique values
0 missing
pctWFarmSelfnumeric99 unique values
0 missing
pctWInvIncnumeric96 unique values
0 missing
pctWSocSecnumeric96 unique values
0 missing
pctWPubAsstnumeric101 unique values
0 missing
pctWRetirenumeric93 unique values
0 missing
medFamIncnumeric98 unique values
0 missing
perCapIncnumeric98 unique values
0 missing
whitePerCapnumeric101 unique values
0 missing
blackPerCapnumeric91 unique values
0 missing
indianPerCapnumeric86 unique values
0 missing
AsianPerCapnumeric98 unique values
0 missing
OtherPerCapnumeric97 unique values
1 missing
HispPerCapnumeric94 unique values
0 missing
NumUnderPovnumeric66 unique values
0 missing
PctPopUnderPovnumeric100 unique values
0 missing
PctLess9thGradenumeric97 unique values
0 missing
PctNotHSGradnumeric99 unique values
0 missing
PctBSorMorenumeric96 unique values
0 missing
PctUnemployednumeric98 unique values
0 missing
PctEmploynumeric96 unique values
0 missing
PctEmplManunumeric100 unique values
0 missing
PctEmplProfServnumeric96 unique values
0 missing
PctOccupManunumeric98 unique values
0 missing
PctOccupMgmtProfnumeric99 unique values
0 missing
MalePctDivorcenumeric98 unique values
0 missing
MalePctNevMarrnumeric96 unique values
0 missing
FemalePctDivnumeric91 unique values
0 missing
TotalPctDivnumeric94 unique values
0 missing
PersPerFamnumeric92 unique values
0 missing
PctFam2Parnumeric101 unique values
0 missing
PctKids2Parnumeric97 unique values
0 missing
PctYoungKids2Parnumeric99 unique values
0 missing
PctTeen2Parnumeric96 unique values
0 missing
PctWorkMomYoungKidsnumeric95 unique values
0 missing
PctWorkMomnumeric98 unique values
0 missing
NumIllegnumeric55 unique values
0 missing
PctIllegnumeric97 unique values
0 missing
NumImmignumeric47 unique values
0 missing
PctImmigRecentnumeric99 unique values
0 missing
PctImmigRec5numeric100 unique values
0 missing
PctImmigRec8numeric97 unique values
0 missing
PctImmigRec10numeric97 unique values
0 missing
PctRecentImmignumeric95 unique values
0 missing
PctRecImmig5numeric97 unique values
0 missing
PctRecImmig8numeric98 unique values
0 missing
PctRecImmig10numeric100 unique values
0 missing
PctSpeakEnglOnlynumeric98 unique values
0 missing
PctNotSpeakEnglWellnumeric94 unique values
0 missing
PctLargHouseFamnumeric99 unique values
0 missing
PctLargHouseOccupnumeric96 unique values
0 missing
PersPerOccupHousnumeric96 unique values
0 missing
PersPerOwnOccHousnumeric94 unique values
0 missing
PersPerRentOccHousnumeric98 unique values
0 missing
PctPersOwnOccupnumeric100 unique values
0 missing
PctPersDenseHousnumeric94 unique values
0 missing
PctHousLess3BRnumeric100 unique values
0 missing
MedNumBRnumeric3 unique values
0 missing
HousVacantnumeric70 unique values
0 missing
PctHousOccupnumeric92 unique values
0 missing
PctHousOwnOccnumeric99 unique values
0 missing
PctVacantBoardednumeric97 unique values
0 missing
PctVacMore6Mosnumeric98 unique values
0 missing
MedYrHousBuiltnumeric49 unique values
0 missing
PctHousNoPhonenumeric99 unique values
0 missing
PctWOFullPlumbnumeric91 unique values
0 missing
OwnOccLowQuartnumeric99 unique values
0 missing
OwnOccMedValnumeric100 unique values
0 missing
OwnOccHiQuartnumeric98 unique values
0 missing
RentLowQnumeric101 unique values
0 missing
RentMediannumeric99 unique values
0 missing
RentHighQnumeric99 unique values
0 missing
MedRentnumeric100 unique values
0 missing
MedRentPctHousIncnumeric95 unique values
0 missing
MedOwnCostPctIncnumeric97 unique values
0 missing
MedOwnCostPctIncNoMtgnumeric70 unique values
0 missing
NumInSheltersnumeric54 unique values
0 missing
NumStreetnumeric53 unique values
0 missing
PctForeignBornnumeric96 unique values
0 missing
PctBornSameStatenumeric99 unique values
0 missing
PctSameHouse85numeric99 unique values
0 missing
PctSameCity85numeric100 unique values
0 missing
PctSameState85numeric97 unique values
0 missing
LemasSwornFTnumeric38 unique values
1675 missing
LemasSwFTPerPopnumeric52 unique values
1675 missing
LemasSwFTFieldOpsnumeric34 unique values
1675 missing
LemasSwFTFieldPerPopnumeric55 unique values
1675 missing
LemasTotalReqnumeric44 unique values
1675 missing
LemasTotReqPerPopnumeric59 unique values
1675 missing
PolicReqPerOfficnumeric75 unique values
1675 missing
PolicPerPopnumeric52 unique values
1675 missing
RacialMatchCommPolnumeric76 unique values
1675 missing
PctPolicWhitenumeric74 unique values
1675 missing
PctPolicBlacknumeric73 unique values
1675 missing
PctPolicHispnumeric54 unique values
1675 missing
PctPolicAsiannumeric50 unique values
1675 missing
PctPolicMinornumeric72 unique values
1675 missing
OfficAssgnDrugUnitsnumeric30 unique values
1675 missing
NumKindsDrugsSeiznumeric15 unique values
1675 missing
PolicAveOTWorkednumeric77 unique values
1675 missing
LandAreanumeric61 unique values
0 missing
PopDensnumeric96 unique values
0 missing
PctUsePubTransnumeric98 unique values
0 missing
PolicCarsnumeric63 unique values
1675 missing
PolicOperBudgnumeric38 unique values
1675 missing
LemasPctPolicOnPatrnumeric72 unique values
1675 missing
LemasGangUnitDeploynumeric3 unique values
1675 missing
LemasPctOfficDrugUnnumeric80 unique values
0 missing
PolicBudgPerPopnumeric51 unique values
1675 missing

107 properties

1994
Number of instances (rows) of the dataset.
128
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
39202
Number of missing values in the dataset.
1871
Number of instances with at least one value missing.
127
Number of numeric attributes.
0
Number of nominal attributes.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
0
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
Average number of distinct values among the attributes of the nominal type.
0.06
First quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Standard deviation of the number of distinct values among attributes of the nominal type.
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
1.3
Mean skewness among attributes of the numeric type.
0.17
First quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
200.55
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Percentage of instances belonging to the most frequent class.
Minimal entropy among attributes.
1.57
Second quartile (Median) of kurtosis among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Entropy of the target attribute values.
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
Number of instances belonging to the most frequent class.
-1.45
Minimum kurtosis among attributes of the numeric type.
0.36
Second quartile (Median) of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
0.02
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
69.18
Maximum kurtosis among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
1.1
Second quartile (Median) of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
46188.34
Maximum of means among attributes of the numeric type.
The minimal number of distinct values among attributes of the nominal type.
0
Percentage of binary attributes.
0.2
Second quartile (Median) of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.06
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
-5.05
Minimum skewness among attributes of the numeric type.
93.83
Percentage of instances having missing values.
Third quartile of entropy among attributes.
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
The maximum number of distinct values among attributes of the nominal type.
0.09
Minimum standard deviation of attributes of the numeric type.
15.36
Percentage of missing values.
4.58
Third quartile of kurtosis among attributes of the numeric type.
0.76
Average class difference between consecutive instances.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
7.47
Maximum skewness among attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
99.22
Percentage of numeric attributes.
0.49
Third quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
25299.73
Maximum standard deviation of attributes of the numeric type.
Number of instances belonging to the least frequent class.
0
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
2.08
Third quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
5.8
Mean kurtosis among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.08
First quartile of kurtosis among attributes of the numeric type.
0.22
Third quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
364.76
Mean of means among attributes of the numeric type.
Average mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.22
First quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

4 tasks

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: ViolentCrimesPerPop
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: ViolentCrimesPerPop
Define a new task