OpenML
Filter results by:
__Major changes w.r.t. version 1: deactivated first two variables as they describe the batch of the experiments and should not be used for prediction. Also transformed the target from numeric to…
8775 runs0 likes3 downloads3 reach13 impact
540 instances - 21 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
12 runs0 likes0 downloads0 reach13 impact
4704 instances - 47 features - 3 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
11 runs0 likes0 downloads0 reach13 impact
4704 instances - 47 features - 3 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
12 runs0 likes0 downloads0 reach13 impact
2351 instances - 47 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
15 runs0 likes0 downloads0 reach13 impact
4704 instances - 47 features - 3 classes - 0 missing values
GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusable digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection…
466 runs0 likes52 downloads52 reach25 impact
7000 instances - 5001 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
11 runs0 likes0 downloads0 reach13 impact
4704 instances - 47 features - 3 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
10 runs0 likes0 downloads0 reach13 impact
3660 instances - 47 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
10 runs0 likes0 downloads0 reach13 impact
2352 instances - 47 features - 2 classes - 0 missing values
__Major changes w.r.t. version 2: ignored variable 3 in this upload as this seems to be ea perfect predictor.__ Tamilnadu Electricity Board Hourly Readings dataset. Real-time readings were collected…
0 runs0 likes2 downloads2 reach19 impact
45781 instances - 4 features - 20 classes - 0 missing values
SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor…
52 runs0 likes1 downloads1 reach15 impact
99289 instances - 3073 features - 10 classes - 0 missing values
The dataset and this description is made available on http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html. Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal…
57 runs0 likes1 downloads1 reach11 impact
9298 instances - 257 features - 10 classes - 0 missing values
This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge. As described on the original website: There are ten different images of each of 40…
53 runs0 likes0 downloads0 reach14 impact
400 instances - 4097 features - 40 classes - 0 missing values
The Sheffield (previously UMIST) Face Database consists of 564 images of 20 individuals (mixed race/gender/appearance). Each individual is shown in a range of poses from profile to frontal views -…
53 runs0 likes1 downloads1 reach15 impact
575 instances - 10305 features - 20 classes - 0 missing values
rotated MNIS digits, from http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/MnistVariations
0 runs0 likes0 downloads0 reach13 impact
62000 instances - 785 features - 0 classes - 0 missing values
wine-quality-red-pmlb
31 runs1 likes1 downloads2 reach22 impact
1599 instances - 12 features - 6 classes - 0 missing values
Dataset used by Buntine and Niblett (1992). Composed of 10 features, one of which is irrelevant. The target is a disjunctive normal form formula over the nine other attributes, with additional…
31 runs0 likes0 downloads0 reach21 impact
973 instances - 10 features - 2 classes - 0 missing values
cars1-pmlb
31 runs0 likes3 downloads3 reach20 impact
392 instances - 8 features - 3 classes - 0 missing values
flare-pmlb
32 runs0 likes1 downloads1 reach21 impact
1066 instances - 11 features - 2 classes - 0 missing values
PMLB version of the Titanic dataset, which only uses 3 features. See version 1 for the complete version: https://www.openml.org/d/40945
35 runs0 likes1 downloads1 reach22 impact
2201 instances - 4 features - 2 classes - 0 missing values
)), [PMLB](https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/tokyo1) This is Performance co-pilot (PCP) data for the Tokyo server at Silicon Graphics International…
37 runs0 likes1 downloads1 reach21 impact
959 instances - 45 features - 2 classes - 0 missing values
parity5_plus_5-pmlb
31 runs0 likes0 downloads0 reach21 impact
1124 instances - 11 features - 2 classes - 0 missing values
allbp-pmlb
31 runs0 likes2 downloads2 reach20 impact
3772 instances - 30 features - 3 classes - 0 missing values
allrep-pmlb
31 runs0 likes1 downloads1 reach20 impact
3772 instances - 30 features - 4 classes - 0 missing values
analcatdata_happiness-pmlb
31 runs0 likes0 downloads0 reach20 impact
60 instances - 4 features - 3 classes - 0 missing values
cleve-pmlb
32 runs0 likes1 downloads1 reach20 impact
303 instances - 14 features - 2 classes - 0 missing values
cleveland-nominal-pmlb
31 runs0 likes1 downloads1 reach20 impact
303 instances - 8 features - 5 classes - 0 missing values
dis-pmlb
31 runs0 likes1 downloads1 reach21 impact
3772 instances - 30 features - 2 classes - 0 missing values
parity5-pmlb
32 runs0 likes0 downloads0 reach20 impact
32 instances - 6 features - 2 classes - 0 missing values
Multi-label dataset. The genbase dataset contains protein sequences that can be assigned to several classes of protein families.
0 runs0 likes1 downloads1 reach11 impact
662 instances - 1213 features - 2 classes - 0 missing values
The langLog dataset includes 1004 textual predictors and was originally compiled in the doctorial thesis of Read (2010). It consists of 956 text samples that can be assigned to one or more topics such…
0 runs0 likes4 downloads4 reach11 impact
1460 instances - 1079 features - 2 classes - 0 missing values
Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.
0 runs0 likes8 downloads8 reach12 impact
2000 instances - 250 features - 2 classes - 0 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes12 downloads12 reach11 impact
2407 instances - 300 features - 2 classes - 0 missing values
Multi-label dataset. The yeast dataset (Elisseeff and Weston, 2002) consists of micro-array expression data, as well as phylogenetic profiles of yeast, and includes 2417 genes and 103 predictors. In…
0 runs0 likes2 downloads2 reach11 impact
2417 instances - 117 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes1 downloads1 reach21 impact
1600 instances - 1001 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach21 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach21 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach21 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes1 downloads1 reach21 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes1 downloads1 reach21 impact
1600 instances - 21 features - 2 classes - 0 missing values
analcatdata_fraud-pmlb
34 runs0 likes0 downloads0 reach20 impact
42 instances - 12 features - 2 classes - 0 missing values
calendarDOW-pmlb
31 runs0 likes1 downloads1 reach20 impact
399 instances - 33 features - 5 classes - 0 missing values
car-evaluation-pmlb
31 runs0 likes2 downloads2 reach20 impact
1728 instances - 22 features - 4 classes - 0 missing values
Derived from the Musk dataset: https://www.openml.org/d/1116
31 runs0 likes1 downloads1 reach21 impact
476 instances - 169 features - 2 classes - 0 missing values
corral-pmlb
31 runs0 likes1 downloads1 reach21 impact
160 instances - 7 features - 2 classes - 0 missing values
ecoli-pmlb
31 runs0 likes1 downloads1 reach20 impact
327 instances - 8 features - 5 classes - 0 missing values
Re-upload of the dataset as it is present in the Penn ML Benchmark (https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/fars). It's a dataset on traffic accidents,…
1 runs0 likes3 downloads3 reach22 impact
100968 instances - 30 features - 8 classes - 0 missing values
led24-pmlb
31 runs0 likes2 downloads2 reach21 impact
3200 instances - 25 features - 10 classes - 0 missing values
led7-pmlb
31 runs0 likes0 downloads0 reach21 impact
3200 instances - 8 features - 10 classes - 0 missing values
The origin is not clear, but presumably this is an artificial problem representing M-of-N rules. The target is 1 if a certain M 'bits' are '1'? (Joaquin Vanschoren)
31 runs0 likes0 downloads0 reach21 impact
1324 instances - 11 features - 2 classes - 0 missing values
mux6-pmlb
31 runs0 likes1 downloads1 reach20 impact
128 instances - 7 features - 2 classes - 0 missing values
new-thyroid-pmlb
31 runs0 likes2 downloads2 reach20 impact
215 instances - 6 features - 3 classes - 0 missing values
postoperative-patient-data-pmlb
26 runs0 likes1 downloads1 reach20 impact
88 instances - 9 features - 2 classes - 0 missing values
Relevant Information: -- The database contains 3 potential classes, one for the number of times a certain type of solar flare occured in a 24 hour period. -- Each instance represents captured features…
31 runs0 likes1 downloads1 reach20 impact
315 instances - 13 features - 5 classes - 0 missing values
Relevant Information: -- The database contains 3 potential classes, one for the number of times a certain type of solar flare occured in a 24 hour period. -- Each instance represents captured features…
31 runs0 likes0 downloads0 reach20 impact
1066 instances - 13 features - 6 classes - 0 missing values
threeOf9-pmlb
31 runs0 likes0 downloads0 reach21 impact
512 instances - 10 features - 2 classes - 0 missing values
Small dataset with time series of RAM prices over the years.
0 runs1 likes4 downloads5 reach11 impact
333 instances - 3 features - 0 classes - 0 missing values
CD4 count prediction date
0 runs0 likes0 downloads0 reach10 impact
16484 instances - 62 features - classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes2 downloads2 reach9 impact
14 instances - 5 features - 2 classes - 0 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs0 likes1 downloads1 reach10 impact
593 instances - 78 features - classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs0 likes2 downloads2 reach9 impact
2000 instances - 140 features - classes - 0 missing values
Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.
0 runs0 likes0 downloads0 reach9 impact
2000 instances - 250 features - classes - 0 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes0 downloads0 reach9 impact
2407 instances - 300 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Andromeda dataset (Hatzikos et al. 2008) concerns the prediction of future values for six…
0 runs0 likes0 downloads0 reach9 impact
49 instances - 36 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Airline Ticket Price dataset concerns the prediction of airline ticket prices. The rows are a…
0 runs0 likes0 downloads0 reach9 impact
337 instances - 417 features - classes - 0 missing values
Data set shows information about participants of math conference. isPresent is target column for classification task.
0 runs0 likes0 downloads0 reach9 impact
246 instances - 7 features - 2 classes - 0 missing values
Historical Rainfall data of Bangladesh
0 runs0 likes0 downloads0 reach11 impact
16755 instances - 4 features - 0 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Airline Ticket Price dataset concerns the prediction of airline ticket prices. The rows are a…
0 runs0 likes0 downloads0 reach9 impact
296 instances - 417 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Electrical Discharge Machining dataset (Karalic and Bratko 1997) represents a two-target…
0 runs0 likes0 downloads0 reach9 impact
154 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Energy Building dataset (Tsanas and Xifara 2012) concerns the prediction of the heating load…
0 runs0 likes0 downloads0 reach9 impact
768 instances - 10 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Jura (Goovaerts 1997) dataset consists of measurements of concentrations of seven heavy…
0 runs0 likes0 downloads0 reach9 impact
359 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Solar Flare dataset (Lichman 2013) has 3 target variables that correspond to the number of…
0 runs0 likes0 downloads0 reach9 impact
323 instances - 13 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Solar Flare dataset (Lichman 2013) has 3 target variables that correspond to the number of…
0 runs0 likes0 downloads0 reach9 impact
1066 instances - 13 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Concrete Slump dataset (Yeh 2007) concerns the prediction of three properties of concrete…
0 runs1 likes0 downloads1 reach9 impact
103 instances - 10 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes1 downloads1 reach9 impact
150 instances - 5 features - 3 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach13 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach13 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach13 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach13 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach13 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach13 impact
442 instances - 11 features - 0 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes1 downloads1 reach9 impact
14 instances - 5 features - 2 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Water Quality dataset (Dzeroski et al. 2000) has 14 target attributes that refer to the…
0 runs0 likes0 downloads0 reach9 impact
1060 instances - 30 features - classes - 0 missing values
The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show…
0 runs0 likes0 downloads0 reach9 impact
404 instances - 31 features - classes - 0 missing values
This dataset contains 358 lyrics of songs for the rock bands 'The Rolling Stones' and 'Deep Purple'. The bands are equally represented in the dataset (179 songs for each band). This dataset was…
8 runs0 likes1 downloads1 reach20 impact
358 instances - 2 features - 2 classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach9 impact
150 instances - 5 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach9 impact
150 instances - 5 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach9 impact
150 instances - 5 features - 3 classes - 0 missing values
Klaverjas is an example of the Jack-Nine card games, which are characterized as trick-taking games where the the Jack and nine of the trump suit are the highest-ranking trumps, and the tens and aces…
0 runs0 likes1 downloads1 reach10 impact
981541 instances - 33 features - 2 classes - 0 missing values
The goal is to predict the Fare. Variable description: pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower age: Age is fractional if less than 1. If the age is…
0 runs0 likes4 downloads4 reach10 impact
1307 instances - 8 features - 0 classes - 0 missing values
This dataset is gather to detect whether a person is running or walking based on deep neural networks and sensor data collected from iOS devices. The dataset represents 88588 sensor data samples…
1 runs0 likes4 downloads4 reach14 impact
88588 instances - 7 features - 2 classes - 0 missing values
Over 92 thousand images (32x32 pixels) of 46 characters from Devanagari script. Includes the alphabet as well as the numbers. Devanagari is an Indic script and forms a basis for over 100 languages…
42 runs2 likes7 downloads9 reach14 impact
92000 instances - 1025 features - 46 classes - 0 missing values
This is a 20,000 instance sample of the original CIFAR-10 dataset. Sampled randomly and stratified, with 2000 examples per class. Training and test set are merged. Find the corresponding task for the…
380 runs0 likes4 downloads4 reach21 impact
20000 instances - 3073 features - 10 classes - 0 missing values
0. airplane 1. automobile 2. bird 3. cat 4. deer 5. dog 6. frog 7. horse 8. ship 9. truck CIFAR-10 contains 6000 images per class. The original train-test split randomly divided these into 5000 train…
151 runs0 likes5 downloads5 reach20 impact
60000 instances - 3073 features - 10 classes - 0 missing values
"The speech dataset was also provided by (see citation request) and contains real world data from recorded English language. The normal class contains data from persons having an American accent…
1599 runs0 likes6 downloads6 reach17 impact
3686 instances - 401 features - 2 classes - 0 missing values
Data from https://doi.org/10.5281/zenodo.269636
0 runs0 likes4 downloads4 reach14 impact
4758 instances - 39 features - classes - 0 missing values
#study_1
0 runs0 likes0 downloads0 reach10 impact
944 instances - 17 features - classes - 0 missing values
The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril…
2 runs0 likes1 downloads1 reach12 impact
158 instances - 12 features - 0 classes - 0 missing values
microaggregation2_nominal
1 runs0 likes1 downloads1 reach12 impact
20000 instances - 21 features - 5 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes1 downloads1 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values