350 webdata_wXa 1 **Author**: John Platt **Source**: [libSVM](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets) - Date unknown **Please cite**: John C. Platt. Fast training of support vector machines using sequential minimal optimization. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, Cambridge, MA, 1998. MIT Press.a This is the famous webdata dataset w[1-8]a in its binary version, retrieved 2014-11-14 from the libSVM site. Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows: * load all web data datasets, train and test, e.g. w1a, w1a.t, w2a, w2a.t, w3a, ... * join test and train for each subset, e.g. w1a and w1a.t, w2a and w2a.t * normalize each file columnwise according to the following rules: * If a column only contains one value (constant feature), it will set to zero and thus removed by sparsity. * If a column contains two values (binary feature), the value occuring more often will be set to zero, the other to one. * If a column contains more than two values (multinary/real feature), the column is divided by its std deviation. * afterwards all these 8 files are merged into one, and randomly sorted. * duplicate lines were finally removed. An R script which does all of these steps can be found here: https://github.com/openml/data_scripts/blob/master/webdata_wXa/dataDownloader.R 1 Sparse_ARFF John Platt 1998 2014-08-29T18:29:14 English Public https://api.openml.org/data/v1/download/52253/webdata_wXa.sparse_arff 52253 Y John C. Platt. Fast training of support vector machines using sequential minimal optimization. In Bernhard Scholkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods - Support Vector Learning, Cambridge, MA Computer SystemsMachine Learningmythbusting_1study_1study_15study_20study_34study_41 public http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets https://ieeexplore.ieee.org/abstract/document/4731075 active 2020-11-20 19:44:57 e35d0578373c6100a62354ca4d16744a