Data
Filter results by:
This S dump contains 22 data sets from the book Visualizing Data published by Hobart Press (books@hobart.com). The dump was created by data.dump() and can be read back into S by data.restore(). The…
0 runs0 likes2 downloads2 reach5 impact
88 instances - 3 features - 0 classes - 0 missing values
No data.
43 runs0 likes2 downloads2 reach1 impact
1000000 instances - 45 features - 2 classes - 0 missing values
No data.
337 runs1 likes2 downloads3 reach2 impact
1000000 instances - 13 features - 3 classes - 0 missing values
No data.
45 runs0 likes2 downloads2 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
Pittsburgh bridges This version is derived from version 2 (the discretized version) by removing all instances with missing values in the last (target) attribute. The bridges dataset is originally not…
31 runs0 likes2 downloads2 reach7 impact
105 instances - 13 features - 6 classes - 61 missing values
Data from the RSCTC 2010 Discovery Challenge. All datasets contain between 100 and 400 samples, characterized by values of 20,000 - 65,000 attributes. Samples are assigned to several (2-10) classes.…
9 runs0 likes2 downloads2 reach6 impact
283 instances - 54622 features - 3 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. All datasets contain between 100 and 400 samples, characterized by values of 20,000 - 65,000 attributes. Samples are assigned to several (2-10) classes.…
1 runs0 likes2 downloads2 reach6 impact
383 instances - 54676 features - 9 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. Example datasets for 6 different problems of DNA microarray data analysis and classification. All datasets contain gene expression data characterized by…
8 runs0 likes2 downloads2 reach6 impact
113 instances - 54676 features - 5 classes - 0 missing values
No data.
28 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
37 runs0 likes2 downloads2 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
29 runs0 likes2 downloads2 reach2 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
9 runs0 likes2 downloads2 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
9 runs0 likes2 downloads2 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
10 runs0 likes2 downloads2 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
29 runs0 likes2 downloads2 reach2 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
30 runs0 likes2 downloads2 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
libSVM","AAD group A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246-2253, 2003. #Dataset from the LIBSVM data repository.…
0 runs0 likes2 downloads2 reach5 impact
86 instances - 7130 features - 0 classes - 0 missing values
No data.
27 runs0 likes2 downloads2 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes2 downloads2 reach5 impact
32561 instances - 124 features - 0 classes - 0 missing values
No data.
288 runs0 likes2 downloads2 reach2 impact
1000000 instances - 15 features - 9 classes - 0 missing values
No data.
51 runs0 likes2 downloads2 reach1 impact
1000000 instances - 15 features - 2 classes - 0 missing values
No data.
0 runs0 likes2 downloads2 reach1 impact
1000000 instances - 37 features - 0 classes - 0 missing values
One of two multivariate regression data sets from paper industry, from an experiment at the paper plant Norske Skog, Skogn, Norway. They have been described and analysed in: Aldrin, M. (1996),…
0 runs0 likes2 downloads2 reach5 impact
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes2 downloads2 reach3 impact
100 instances - 10 features - classes - 0 missing values
Abstract: The data set is composed of 60 chorales (5665 events) by J.S. Bach (1675-1750). Each event of each chorale is labelled using 1 among 101 chord labels and described through 14 features.…
31 runs0 likes2 downloads2 reach5 impact
5665 instances - 17 features - 102 classes - 0 missing values
Abstract: This data set contains a total 5820 evaluation scores provided by students from Gazi University in Ankara (Turkey). There is a total of 28 course specific questions and additional 5…
0 runs0 likes2 downloads2 reach5 impact
5820 instances - 33 features - classes - 0 missing values
Abstract: This data contains general demographic information on internet users in 1997. Source: Original Owner: Graphics, Visualization, & Usability Center College of Computing Geogia Institute of…
0 runs0 likes2 downloads2 reach3 impact
Source: The dataset was created by Athanasios Tsanas (tsanasthanasis '@' gmail.com) and Max Little (littlem '@' physics.ox.ac.uk) of the University of Oxford, in collaboration with 10 medical centers…
0 runs1 likes2 downloads3 reach3 impact
5875 instances - 22 features - classes - 0 missing values
No data.
67 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 10 classes - 0 missing values
No data.
66 runs0 likes2 downloads2 reach2 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
60 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Survival treated as the class attribute As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using…
12 runs0 likes2 downloads2 reach1 impact
130 instances - 10 features - 0 classes - 97 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Weight treated as the class attribute. Identifier deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric…
10 runs0 likes2 downloads2 reach1 impact
158 instances - 8 features - 0 classes - 87 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Case number deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning…
10 runs1 likes2 downloads3 reach1 impact
195 instances - 12 features - 0 classes - 2 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes2 downloads2 reach5 impact
195 instances - 33 features - 0 classes - 0 missing values
No data.
68 runs0 likes2 downloads2 reach1 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
63 runs0 likes2 downloads2 reach2 impact
1000000 instances - 41 features - 3 classes - 0 missing values
No data.
65 runs1 likes2 downloads3 reach1 impact
1000000 instances - 18 features - 7 classes - 0 missing values
No data.
305 runs0 likes2 downloads2 reach2 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
308 runs0 likes2 downloads2 reach2 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
307 runs0 likes2 downloads2 reach2 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
66 runs0 likes2 downloads2 reach1 impact
1000000 instances - 14 features - 5 classes - 0 missing values
No data.
66 runs0 likes2 downloads2 reach1 impact
1000000 instances - 14 features - 5 classes - 0 missing values
No data.
70 runs0 likes2 downloads2 reach2 impact
1000000 instances - 14 features - 2 classes - 0 missing values
No data.
67 runs0 likes2 downloads2 reach2 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
52 runs0 likes2 downloads2 reach1 impact
1000000 instances - 65 features - 10 classes - 0 missing values
No data.
293 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 10 classes - 0 missing values
No data.
75 runs0 likes2 downloads2 reach1 impact
137781 instances - 10 features - 7 classes - 0 missing values
No data.
310 runs0 likes2 downloads2 reach1 impact
1000000 instances - 14 features - 5 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 101309, and it has 73 rows and 1026 features (including…
1 runs0 likes2 downloads2 reach3 impact
73 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10980, and it has 5766 rows and 1026 features…
1 runs0 likes2 downloads2 reach3 impact
5766 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10496, and it has 40 rows and 1026 features (including…
1 runs0 likes2 downloads2 reach3 impact
40 instances - 1026 features - 0 classes - 0 missing values
Multi-label dataset. The yeast dataset (Elisseeff and Weston, 2002) consists of micro-array expression data, as well as phylogenetic profiles of yeast, and includes 2417 genes and 103 predictors. In…
0 runs0 likes2 downloads2 reach3 impact
2417 instances - 117 features - 2 classes - 0 missing values
new-thyroid-pmlb
31 runs0 likes2 downloads2 reach11 impact
215 instances - 6 features - 3 classes - 0 missing values
Zurich public transport delay data 2016-10-30 03:30:00 CET - 2016-11-27 01:20:00 CET cleaned and prepared at Open Data Day 2017.
0 runs0 likes2 downloads2 reach4 impact
5465575 instances - 15 features - 0 classes - 132617 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 237, and it has 510 rows and 1026 features (including…
1 runs0 likes2 downloads2 reach3 impact
510 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10627, and it has 2408 rows and 1026 features…
1 runs0 likes2 downloads2 reach3 impact
2408 instances - 1026 features - 0 classes - 0 missing values
* Dataset Title: Robot Execution Failures Data Set * Abstract: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque…
71 runs0 likes2 downloads2 reach5 impact
47 instances - 91 features - 5 classes - 0 missing values
__Changes w.r.t. version 1: included one target factor with 7 levels as target variable for the classification. Also deleted the previous 7 binary target variables.__ A dataset of steel plates'…
7374 runs1 likes2 downloads3 reach8 impact
1941 instances - 28 features - 7 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. The maps were scanned in 8 bit grey value at density of 400dpi,…
9398 runs1 likes2 downloads3 reach13 impact
2000 instances - 241 features - 10 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
3 runs0 likes2 downloads2 reach6 impact
50 instances - 6 features - 0 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
0 runs1 likes2 downloads3 reach3 impact
798964 instances - 12 features - 3 classes - 399482 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes2 downloads2 reach4 impact
366 instances - 5 features - classes - 2 missing values
Annual salary information including gross pay and overtime pay for all active, permanent employees of Montgomery County, MD paid in calendar year 2016. This information will be published annually each…
0 runs0 likes2 downloads2 reach1 impact
9228 instances - 13 features - 0 classes - 11169 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs0 likes2 downloads2 reach2 impact
2000 instances - 140 features - classes - 0 missing values
# Data Description This is the historical price data of the FOREX AUD/NZD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes2 downloads2 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10502, and it has 1627 rows and 1026 features…
1 runs0 likes2 downloads2 reach4 impact
1627 instances - 1026 features - 0 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach8 impact
185 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach8 impact
410 instances - 10937 features - 2 classes - 0 missing values
* Dataset: DBworld e-mails data set Task: dbworld-subjects * Source: Michele Filannino, PhD University of Manchester Centre for Doctoral Training Email: filannim_AT_cs.man.ac.uk * Data Set…
40 runs0 likes2 downloads2 reach6 impact
64 instances - 243 features - 2 classes - 0 missing values
* Abstract: 9-class version of poker-hand dataset, it was removed the minority class.
1 runs0 likes2 downloads2 reach7 impact
1025000 instances - 11 features - 9 classes - 0 missing values
No data.
34 runs0 likes2 downloads2 reach4 impact
1000000 instances - 17 features - 26 classes - 0 missing values
led24-pmlb
31 runs0 likes2 downloads2 reach14 impact
3200 instances - 25 features - 10 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Regenerate features by the authors' matlab scripts (see Sec. C of Appendix A), then randomly select 10% instances from the…
0 runs0 likes2 downloads2 reach6 impact
98528 instances - 101 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 191, and it has 4442 rows and 1026 features (including…
1 runs0 likes2 downloads2 reach4 impact
4442 instances - 1026 features - 0 classes - 0 missing values
No data.
315 runs0 likes2 downloads2 reach4 impact
295245 instances - 11 features - 5 classes - 0 missing values
__Major changes w.r.t. version 2: ignored variable 3 in this upload as this seems to be ea perfect predictor.__ Tamilnadu Electricity Board Hourly Readings dataset. Real-time readings were collected…
0 runs0 likes2 downloads2 reach5 impact
45781 instances - 4 features - 20 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10781, and it has 2044 rows and 1026 features…
1 runs0 likes2 downloads2 reach4 impact
2044 instances - 1026 features - 0 classes - 0 missing values
Automated file upload of 20_newsgroups.drift
124 runs0 likes2 downloads2 reach8 impact
399940 instances - 1001 features - 2 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
1 runs0 likes2 downloads2 reach6 impact
1025010 instances - 11 features - 0 classes - 0 missing values
* Dataset Title: Robot Execution Failures Data Set * Abstract: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque…
71 runs0 likes2 downloads2 reach6 impact
88 instances - 91 features - 4 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. Example datasets for 6 different problems of DNA microarray data analysis and classification. All datasets contain gene expression data characterized by…
9 runs1 likes2 downloads3 reach7 impact
95 instances - 22278 features - 5 classes - 0 missing values
https://www.kaggle.com/harlfoxem/ This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It contains 19 house features…
0 runs0 likes2 downloads2 reach1 impact
21613 instances - 20 features - classes - 0 missing values
The task consists of Learning Quantitative Structure Activity Relationships (QSARs). The Inhibition of Dihydrofolate Reductase by Pyrimidines.The data are described in: King, Ross .D., Muggleton,…
6 runs0 likes2 downloads2 reach2 impact
74 instances - 28 features - 0 classes - 0 missing values
The goal is to predict the Fare. Variable description: pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower age: Age is fractional if less than 1. If the age is…
0 runs0 likes2 downloads2 reach3 impact
1307 instances - 8 features - 0 classes - 0 missing values
No data.
50 runs0 likes2 downloads2 reach5 impact
1000000 instances - 18 features - 22 classes - 0 missing values
Data contains the information of 9144 samples form 220 spectral bands. The classes represent land-use types: alfalfa, corn, grass, hay, oats, soybeans, trees, and wheat.
0 runs0 likes2 downloads2 reach3 impact
9144 instances - 221 features - 8 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: E1 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
105 runs0 likes2 downloads2 reach7 impact
1183 instances - 4 features - 5 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: E4 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
106 runs0 likes2 downloads2 reach7 impact
1252 instances - 4 features - 5 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: E2 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
105 runs0 likes2 downloads2 reach7 impact
1080 instances - 4 features - 5 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: E3 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
104 runs0 likes2 downloads2 reach7 impact
1277 instances - 4 features - 5 classes - 0 missing values
Data on the population density of tree pipits, Anthus trivialis, in Franconian oak forests including variables describing the forest ecosystem. This data is taken from R package coin. This study is…
0 runs0 likes2 downloads2 reach6 impact
86 instances - 10 features - 0 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes2 downloads2 reach6 impact
48 instances - 8 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach6 impact
250 instances - 6 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach6 impact
1000 instances - 101 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach6 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach6 impact
100 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach6 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
1 runs0 likes2 downloads2 reach6 impact
1000 instances - 11 features - 0 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
2 runs0 likes2 downloads2 reach6 impact
93 instances - 24 features - 0 classes - 0 missing values