OpenML
Filter results by:
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
66 runs0 likes1 downloads1 reach7 impact
386 instances - 10937 features - 2 classes - 0 missing values
No data.
50 runs0 likes3 downloads3 reach1 impact
1000000 instances - 61 features - 2 classes - 0 missing values
No data.
27 runs1 likes3 downloads4 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
65 runs0 likes5 downloads5 reach1 impact
1000000 instances - 30 features - 4 classes - 0 missing values
No data.
143 runs0 likes4 downloads4 reach2 impact
1000000 instances - 39 features - 6 classes - 0 missing values
This is the famous covertype dataset in its binary version, retrieved 2013-11-13 from the libSVM site (called covtype.binary there). Additional to the preprocessing done there (see LibSVM site for…
22 runs0 likes9 downloads9 reach7 impact
581012 instances - 55 features - 2 classes - 0 missing values
Automated file upload of BNG(credit-g)
99 runs0 likes3 downloads3 reach2 impact
1000000 instances - 21 features - 2 classes - 0 missing values
Automated file upload of BNG(spambase)
98 runs0 likes3 downloads3 reach2 impact
1000000 instances - 58 features - 2 classes - 0 missing values
Automated file upload of BNG(optdigits)
100 runs1 likes1 downloads2 reach2 impact
1000000 instances - 65 features - 10 classes - 0 missing values
Automated file upload of BNG(ionosphere)
99 runs1 likes4 downloads5 reach3 impact
1000000 instances - 35 features - 2 classes - 0 missing values
Automated file upload of BNG(segment)
99 runs0 likes1 downloads1 reach2 impact
1000000 instances - 20 features - 7 classes - 0 missing values
Automated file upload of BNG(anneal)
100 runs0 likes3 downloads3 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
Multi-label dataset. The genbase dataset contains protein sequences that can be assigned to several classes of protein families.
0 runs0 likes1 downloads1 reach3 impact
662 instances - 1213 features - 2 classes - 0 missing values
The langLog dataset includes 1004 textual predictors and was originally compiled in the doctorial thesis of Read (2010). It consists of 956 text samples that can be assigned to one or more topics such…
0 runs0 likes4 downloads4 reach3 impact
1460 instances - 1079 features - 2 classes - 0 missing values
Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.
0 runs0 likes8 downloads8 reach4 impact
2000 instances - 250 features - 2 classes - 0 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes12 downloads12 reach3 impact
2407 instances - 300 features - 2 classes - 0 missing values
Multi-label dataset. The yeast dataset (Elisseeff and Weston, 2002) consists of micro-array expression data, as well as phylogenetic profiles of yeast, and includes 2417 genes and 103 predictors. In…
0 runs0 likes2 downloads2 reach3 impact
2417 instances - 117 features - 2 classes - 0 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs2 likes5 downloads7 reach3 impact
593 instances - 78 features - 2 classes - 0 missing values
General Description of Thyroid Disease Databases and Related Files This directory contains 6 databases, corresponding test set, and corresponding documentation. They were left at the University of…
32 runs0 likes7 downloads7 reach5 impact
2800 instances - 27 features - 5 classes - 0 missing values
General Description of Thyroid Disease Databases and Related Files This directory contains 6 databases, corresponding test set, and corresponding documentation. They were left at the University of…
31 runs1 likes8 downloads9 reach5 impact
2800 instances - 27 features - 5 classes - 0 missing values
General Description of Thyroid Disease Databases and Related Files This directory contains 6 databases, corresponding test set, and corresponding documentation. They were left at the University of…
31 runs1 likes8 downloads9 reach5 impact
2800 instances - 27 features - 5 classes - 0 missing values
Data from https://doi.org/10.5281/zenodo.269636
0 runs0 likes4 downloads4 reach5 impact
4758 instances - 39 features - classes - 0 missing values
#study_1
0 runs0 likes0 downloads0 reach2 impact
944 instances - 17 features - classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes1 downloads1 reach12 impact
1600 instances - 1001 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
analcatdata_fraud-pmlb
32 runs0 likes0 downloads0 reach11 impact
42 instances - 12 features - 2 classes - 0 missing values
Dataset used by Buntine and Niblett (1992). Composed of 10 features, one of which is irrelevant. The target is a disjunctive normal form formula over the nine other attributes, with additional…
31 runs0 likes0 downloads0 reach12 impact
973 instances - 10 features - 2 classes - 0 missing values
calendarDOW-pmlb
31 runs0 likes1 downloads1 reach11 impact
399 instances - 33 features - 5 classes - 0 missing values
car-evaluation-pmlb
31 runs0 likes1 downloads1 reach11 impact
1728 instances - 22 features - 4 classes - 0 missing values
Derived from the Musk dataset: https://www.openml.org/d/1116
31 runs0 likes1 downloads1 reach12 impact
476 instances - 169 features - 2 classes - 0 missing values
Derived from the Musk dataset: https://www.openml.org/d/1116
31 runs0 likes0 downloads0 reach12 impact
6598 instances - 169 features - 2 classes - 0 missing values
corral-pmlb
31 runs0 likes1 downloads1 reach12 impact
160 instances - 7 features - 2 classes - 0 missing values
)), [PMLB](https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/tokyo1) This is Performance co-pilot (PCP) data for the Tokyo server at Silicon Graphics International…
35 runs0 likes1 downloads1 reach12 impact
959 instances - 45 features - 2 classes - 0 missing values
parity5_plus_5-pmlb
31 runs0 likes0 downloads0 reach12 impact
1124 instances - 11 features - 2 classes - 0 missing values
allbp-pmlb
31 runs0 likes1 downloads1 reach11 impact
3772 instances - 30 features - 3 classes - 0 missing values
allrep-pmlb
31 runs0 likes0 downloads0 reach11 impact
3772 instances - 30 features - 4 classes - 0 missing values
analcatdata_happiness-pmlb
31 runs0 likes0 downloads0 reach11 impact
60 instances - 4 features - 3 classes - 0 missing values
cleve-pmlb
32 runs0 likes1 downloads1 reach11 impact
303 instances - 14 features - 2 classes - 0 missing values
ecoli-pmlb
31 runs0 likes1 downloads1 reach11 impact
327 instances - 8 features - 5 classes - 0 missing values
led7-pmlb
31 runs0 likes0 downloads0 reach12 impact
3200 instances - 8 features - 10 classes - 0 missing values
The origin is not clear, but presumably this is an artificial problem representing M-of-N rules. The target is 1 if a certain M 'bits' are '1'? (Joaquin Vanschoren)
31 runs0 likes0 downloads0 reach12 impact
1324 instances - 11 features - 2 classes - 0 missing values
mux6-pmlb
31 runs0 likes1 downloads1 reach11 impact
128 instances - 7 features - 2 classes - 0 missing values
new-thyroid-pmlb
31 runs0 likes2 downloads2 reach11 impact
215 instances - 6 features - 3 classes - 0 missing values
postoperative-patient-data-pmlb
26 runs0 likes1 downloads1 reach11 impact
88 instances - 9 features - 2 classes - 0 missing values
Relevant Information: -- The database contains 3 potential classes, one for the number of times a certain type of solar flare occured in a 24 hour period. -- Each instance represents captured features…
31 runs0 likes1 downloads1 reach11 impact
315 instances - 13 features - 5 classes - 0 missing values
Relevant Information: -- The database contains 3 potential classes, one for the number of times a certain type of solar flare occured in a 24 hour period. -- Each instance represents captured features…
31 runs0 likes0 downloads0 reach11 impact
1066 instances - 13 features - 6 classes - 0 missing values
threeOf9-pmlb
31 runs0 likes0 downloads0 reach12 impact
512 instances - 10 features - 2 classes - 0 missing values
cleveland-nominal-pmlb
31 runs0 likes1 downloads1 reach11 impact
303 instances - 8 features - 5 classes - 0 missing values
This dataset is gather to detect whether a person is running or walking based on deep neural networks and sensor data collected from iOS devices. The dataset represents 88588 sensor data samples…
1 runs0 likes3 downloads3 reach6 impact
88588 instances - 7 features - 2 classes - 0 missing values
This is a 20,000 instance sample of the original CIFAR-10 dataset. Sampled randomly and stratified, with 2000 examples per class. Training and test set are merged. Find the corresponding task for the…
380 runs0 likes4 downloads4 reach12 impact
20000 instances - 3073 features - 10 classes - 0 missing values
dis-pmlb
31 runs0 likes0 downloads0 reach12 impact
3772 instances - 30 features - 2 classes - 0 missing values
parity5-pmlb
30 runs0 likes0 downloads0 reach11 impact
32 instances - 6 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
11 runs0 likes0 downloads0 reach4 impact
4704 instances - 47 features - 3 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
10 runs0 likes0 downloads0 reach4 impact
3660 instances - 47 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
10 runs0 likes0 downloads0 reach4 impact
2352 instances - 47 features - 2 classes - 0 missing values
EMNIST Balanced https://www.nist.gov/itl/iad/image-group/emnist-dataset
73 runs0 likes0 downloads0 reach8 impact
131600 instances - 785 features - classes - 0 missing values
The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril…
2 runs0 likes1 downloads1 reach3 impact
158 instances - 12 features - 0 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach6 impact
3140 instances - 260 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach5 impact
5832 instances - 309 features - 2 classes - 0 missing values
The dataset and this description is made available on http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html. Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal…
56 runs0 likes1 downloads1 reach3 impact
9298 instances - 257 features - 10 classes - 0 missing values
This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge. As described on the original website: There are ten different images of each of 40…
53 runs0 likes0 downloads0 reach5 impact
400 instances - 4097 features - 40 classes - 0 missing values
The Sheffield (previously UMIST) Face Database consists of 564 images of 20 individuals (mixed race/gender/appearance). Each individual is shown in a range of poses from profile to frontal views -…
53 runs0 likes1 downloads1 reach6 impact
575 instances - 10305 features - 20 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach8 impact
3153 instances - 971 features - 2 classes - 0 missing values
__Major changes w.r.t. version 1: changed binary features to data type factor.__ Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of…
0 runs0 likes0 downloads0 reach2 impact
14395 instances - 217 features - classes - 0 missing values
__Major change w.r.t. version 1: updated data type of binary variables to factor type.__ Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which…
0 runs0 likes1 downloads1 reach2 impact
4562 instances - 49 features - classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
12 runs0 likes0 downloads0 reach4 impact
4704 instances - 47 features - 3 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
11 runs0 likes0 downloads0 reach4 impact
4704 instances - 47 features - 3 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
12 runs0 likes0 downloads0 reach4 impact
2351 instances - 47 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
15 runs0 likes0 downloads0 reach4 impact
4704 instances - 47 features - 3 classes - 0 missing values
These weekly averages are ultimately based on measurements of 4 air samples per hour taken atop intake lines on several towers during steady periods of CO2 concentration of not less than 6 hours per…
0 runs1 likes1 downloads2 reach1 impact
2225 instances - 7 features - 0 classes - 0 missing values
GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusable digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection…
466 runs0 likes52 downloads52 reach16 impact
7000 instances - 5001 features - 2 classes - 0 missing values
rotated MNIS digits, from http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/MnistVariations
0 runs0 likes0 downloads0 reach2 impact
62000 instances - 785 features - 0 classes - 0 missing values
CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image…
40 runs0 likes0 downloads0 reach5 impact
13000 instances - 27649 features - 10 classes - 0 missing values
SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor…
52 runs0 likes1 downloads1 reach6 impact
99289 instances - 3073 features - 10 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12738, and it has 596 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
596 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 102770, and it has 73 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
73 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 102414, and it has 523 rows and 1026 features…
1 runs0 likes1 downloads1 reach3 impact
523 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 20023, and it has 60 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
60 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12131, and it has 111 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
111 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 20033, and it has 273 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
273 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11528, and it has 35 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
35 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11942, and it has 1002 rows and 1026 features…
1 runs0 likes1 downloads1 reach3 impact
1002 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 100962, and it has 35 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
35 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11831, and it has 15 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
15 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 100432, and it has 293 rows and 1026 features…
1 runs0 likes1 downloads1 reach3 impact
293 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 100424, and it has 97 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
97 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11868, and it has 519 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
519 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12787, and it has 10 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
10 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12090, and it has 1312 rows and 1026 features…
1 runs0 likes1 downloads1 reach3 impact
1312 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 100834, and it has 747 rows and 1026 features…
1 runs0 likes1 downloads1 reach3 impact
747 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 101361, and it has 323 rows and 1026 features…
1 runs0 likes1 downloads1 reach3 impact
323 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12227, and it has 1510 rows and 1026 features…
1 runs0 likes1 downloads1 reach3 impact
1510 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12735, and it has 646 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
646 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11199, and it has 104 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
104 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 101521, and it has 15 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
15 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11005, and it has 15 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach3 impact
15 instances - 1026 features - 0 classes - 0 missing values