Data
webdata_wXa

webdata_wXa

active Sparse_ARFF Publicly available Visibility: public Uploaded 29-08-2014 by aydin demircioglu
0 likes downloaded by 11 people , 15 total downloads 0 issues 0 downvotes
  • mythbusting_1 study_1 study_15 study_20 study_34 study_41
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: John Platt Source: [libSVM](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets) - Date unknown Please cite: John C. Platt. Fast training of support vector machines using sequential minimal optimization. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, Cambridge, MA, 1998. MIT Press.a This is the famous webdata dataset w[1-8]a in its binary version, retrieved 2014-11-14 from the libSVM site. Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows: * load all web data datasets, train and test, e.g. w1a, w1a.t, w2a, w2a.t, w3a, ... * join test and train for each subset, e.g. w1a and w1a.t, w2a and w2a.t * normalize each file columnwise according to the following rules: * If a column only contains one value (constant feature), it will set to zero and thus removed by sparsity. * If a column contains two values (binary feature), the value occuring more often will be set to zero, the other to one. * If a column contains more than two values (multinary/real feature), the column is divided by its std deviation. * afterwards all these 8 files are merged into one, and randomly sorted. * duplicate lines were finally removed. An R script which does all of these steps can be found here: https://github.com/openml/data_scripts/blob/master/webdata_wXa/dataDownloader.R

124 features

Y (target)nominal2 unique values
0 missing
X1numeric2 unique values
0 missing
X2numeric2 unique values
0 missing
X3numeric2 unique values
0 missing
X4numeric2 unique values
0 missing
X5numeric2 unique values
0 missing
X6numeric2 unique values
0 missing
X7numeric2 unique values
0 missing
X8numeric2 unique values
0 missing
X9numeric2 unique values
0 missing
X10numeric2 unique values
0 missing
X11numeric2 unique values
0 missing
X12numeric2 unique values
0 missing
X13numeric2 unique values
0 missing
X14numeric2 unique values
0 missing
X15numeric2 unique values
0 missing
X16numeric2 unique values
0 missing
X17numeric2 unique values
0 missing
X18numeric2 unique values
0 missing
X19numeric2 unique values
0 missing
X20numeric2 unique values
0 missing
X21numeric2 unique values
0 missing
X22numeric2 unique values
0 missing
X23numeric2 unique values
0 missing
X24numeric2 unique values
0 missing
X25numeric2 unique values
0 missing
X26numeric2 unique values
0 missing
X27numeric2 unique values
0 missing
X28numeric2 unique values
0 missing
X29numeric2 unique values
0 missing
X30numeric2 unique values
0 missing
X31numeric2 unique values
0 missing
X32numeric2 unique values
0 missing
X33numeric2 unique values
0 missing
X34numeric2 unique values
0 missing
X35numeric2 unique values
0 missing
X36numeric2 unique values
0 missing
X37numeric2 unique values
0 missing
X38numeric2 unique values
0 missing
X39numeric2 unique values
0 missing
X40numeric2 unique values
0 missing
X41numeric2 unique values
0 missing
X42numeric2 unique values
0 missing
X43numeric2 unique values
0 missing
X44numeric2 unique values
0 missing
X45numeric2 unique values
0 missing
X46numeric2 unique values
0 missing
X47numeric2 unique values
0 missing
X48numeric2 unique values
0 missing
X49numeric2 unique values
0 missing
X50numeric2 unique values
0 missing
X51numeric2 unique values
0 missing
X52numeric2 unique values
0 missing
X53numeric2 unique values
0 missing
X54numeric2 unique values
0 missing
X55numeric2 unique values
0 missing
X56numeric2 unique values
0 missing
X57numeric2 unique values
0 missing
X58numeric2 unique values
0 missing
X59numeric2 unique values
0 missing
X60numeric2 unique values
0 missing
X61numeric2 unique values
0 missing
X62numeric2 unique values
0 missing
X63numeric2 unique values
0 missing
X64numeric2 unique values
0 missing
X65numeric2 unique values
0 missing
X66numeric2 unique values
0 missing
X67numeric2 unique values
0 missing
X68numeric2 unique values
0 missing
X69numeric2 unique values
0 missing
X70numeric2 unique values
0 missing
X71numeric2 unique values
0 missing
X72numeric2 unique values
0 missing
X73numeric2 unique values
0 missing
X74numeric2 unique values
0 missing
X75numeric2 unique values
0 missing
X76numeric2 unique values
0 missing
X77numeric2 unique values
0 missing
X78numeric2 unique values
0 missing
X79numeric2 unique values
0 missing
X80numeric2 unique values
0 missing
X81numeric2 unique values
0 missing
X82numeric2 unique values
0 missing
X83numeric2 unique values
0 missing
X84numeric2 unique values
0 missing
X85numeric2 unique values
0 missing
X86numeric2 unique values
0 missing
X87numeric2 unique values
0 missing
X88numeric2 unique values
0 missing
X89numeric2 unique values
0 missing
X90numeric2 unique values
0 missing
X91numeric2 unique values
0 missing
X92numeric2 unique values
0 missing
X93numeric2 unique values
0 missing
X94numeric2 unique values
0 missing
X95numeric2 unique values
0 missing
X96numeric2 unique values
0 missing
X97numeric2 unique values
0 missing
X98numeric2 unique values
0 missing
X99numeric2 unique values
0 missing
X100numeric2 unique values
0 missing
X101numeric2 unique values
0 missing
X102numeric2 unique values
0 missing
X103numeric2 unique values
0 missing
X104numeric2 unique values
0 missing
X105numeric2 unique values
0 missing
X106numeric2 unique values
0 missing
X107numeric2 unique values
0 missing
X108numeric2 unique values
0 missing
X109numeric2 unique values
0 missing
X110numeric2 unique values
0 missing
X111numeric2 unique values
0 missing
X112numeric2 unique values
0 missing
X113numeric2 unique values
0 missing
X114numeric2 unique values
0 missing
X115numeric2 unique values
0 missing
X116numeric2 unique values
0 missing
X117numeric2 unique values
0 missing
X118numeric2 unique values
0 missing
X119numeric2 unique values
0 missing
X120numeric2 unique values
0 missing
X121numeric2 unique values
0 missing
X122numeric2 unique values
0 missing
X123numeric2 unique values
0 missing

107 properties

36974
Number of instances (rows) of the dataset.
124
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
123
Number of numeric attributes.
1
Number of nominal attributes.
Minimal entropy among attributes.
21.97
Second quartile (Median) of kurtosis among attributes of the numeric type.
0.49
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.8
Entropy of the target attribute values.
0.39
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
28100
Number of instances belonging to the most frequent class.
-1.96
Minimum kurtosis among attributes of the numeric type.
0.04
Second quartile (Median) of means among attributes of the numeric type.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.75
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
0
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
0.18
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.24
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
36974
Maximum kurtosis among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
4.9
Second quartile (Median) of skewness among attributes of the numeric type.
0.49
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
0.94
Maximum of means among attributes of the numeric type.
2
The minimal number of distinct values among attributes of the nominal type.
0.81
Percentage of binary attributes.
0.19
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.68
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
-3.8
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
0.24
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
2
The maximum number of distinct values among attributes of the nominal type.
0.01
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
396.95
Third quartile of kurtosis among attributes of the numeric type.
0.73
Average class difference between consecutive instances.
0.35
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.82
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
192.29
Maximum skewness among attributes of the numeric type.
24
Percentage of instances belonging to the least frequent class.
99.19
Percentage of numeric attributes.
0.16
Third quartile of means among attributes of the numeric type.
0.83
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.68
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.18
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.5
Maximum standard deviation of attributes of the numeric type.
8874
Number of instances belonging to the least frequent class.
0.81
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
0.18
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.24
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.47
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
0.88
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
19.97
Third quartile of skewness among attributes of the numeric type.
0.48
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.35
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.82
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
619.96
Mean kurtosis among attributes of the numeric type.
0.19
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
2.67
First quartile of kurtosis among attributes of the numeric type.
0.34
Third quartile of standard deviation of attributes of the numeric type.
0.83
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.68
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.18
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.11
Mean of means among attributes of the numeric type.
0.53
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0
First quartile of means among attributes of the numeric type.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.18
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.24
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.47
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
1
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
0.18
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.48
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.35
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.82
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
1.89
First quartile of skewness among attributes of the numeric type.
0.49
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.83
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0
Standard deviation of the number of distinct values among attributes of the nominal type.
0.18
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
2
Average number of distinct values among the attributes of the nominal type.
0.05
First quartile of standard deviation of attributes of the numeric type.
0.86
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.18
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.78
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.47
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
13.23
Mean skewness among attributes of the numeric type.
0.2
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
0.18
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.48
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.23
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
76
Percentage of instances belonging to the most frequent class.

15 tasks

397 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Y
123 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Y
44 runs - estimation_procedure: 10-fold Learning Curve - target_feature: Y
0 runs - estimation_procedure: Interleaved Test then Train - target_feature: Y
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - target_feature: Y
0 runs - target_feature: Y
0 runs - target_feature: Y
0 runs - target_feature: Y
0 runs - target_feature: Y
0 runs - target_feature: Y
0 runs - target_feature: Y
0 runs - target_feature: Y
Define a new task