OpenML
Filter results by:
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Airline Ticket Price dataset concerns the prediction of airline ticket prices. The rows are a…
0 runs0 likes0 downloads0 reach2 impact
337 instances - 417 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Airline Ticket Price dataset concerns the prediction of airline ticket prices. The rows are a…
0 runs0 likes0 downloads0 reach2 impact
296 instances - 417 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Electrical Discharge Machining dataset (Karalic and Bratko 1997) represents a two-target…
0 runs0 likes0 downloads0 reach2 impact
154 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Energy Building dataset (Tsanas and Xifara 2012) concerns the prediction of the heating load…
0 runs0 likes0 downloads0 reach2 impact
768 instances - 10 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Jura (Goovaerts 1997) dataset consists of measurements of concentrations of seven heavy…
0 runs0 likes0 downloads0 reach2 impact
359 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Solar Flare dataset (Lichman 2013) has 3 target variables that correspond to the number of…
0 runs0 likes0 downloads0 reach2 impact
323 instances - 13 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Solar Flare dataset (Lichman 2013) has 3 target variables that correspond to the number of…
0 runs0 likes0 downloads0 reach2 impact
1066 instances - 13 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Concrete Slump dataset (Yeh 2007) concerns the prediction of three properties of concrete…
0 runs1 likes0 downloads1 reach2 impact
103 instances - 10 features - classes - 0 missing values
The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show…
0 runs0 likes0 downloads0 reach2 impact
404 instances - 31 features - classes - 0 missing values
Data set shows information about participants of math conference. isPresent is target column for classification task.
0 runs0 likes0 downloads0 reach2 impact
246 instances - 7 features - 2 classes - 0 missing values
Historical Rainfall data of Bangladesh
0 runs0 likes0 downloads0 reach4 impact
16755 instances - 4 features - 0 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach2 impact
14 instances - 5 features - 2 classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
dataset for feature extraction
0 runs0 likes0 downloads0 reach1 impact
69 instances - 37 features - classes - 0 missing values
analysis of stocks
0 runs0 likes0 downloads0 reach1 impact
245 instances - 15 features - classes - 0 missing values
This dataset is an artificial simulation of the Duffing system with random changes from the chaotic to the non-chaotic regime at different noise levels.
0 runs0 likes0 downloads0 reach1 impact
2493200 instances - 26 features - classes - 0 missing values
This dataset is an artificial simulation of the Duffing system with one phase transition to the chaotic regime.
0 runs0 likes0 downloads0 reach1 impact
9983 instances - 4 features - classes - 0 missing values
Hourly particulate matter air polution data of Great Britain for the year 2017, provided by Ricardo Energy and Environment on behalf of the UK Department for Environment, Food and Rural Affairs…
0 runs0 likes0 downloads0 reach1 impact
394299 instances - 10 features - 0 classes - 0 missing values
Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) [http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml]. The dataset includes TLC trips of the green line in…
0 runs0 likes0 downloads0 reach1 impact
581835 instances - 15 features - 0 classes - 0 missing values
Embedding of atoms for HIV inhibitors dataser
0 runs0 likes0 downloads0 reach0 impact
1069964 instances - 30 features - classes - 0 missing values
Source: C. Okan Sakar a, Gorkem Serbes b, Aysegul Gunduz c, Hunkar C. Tunc a, Hatice Nizam d, Betul Erdogdu Sakar e, Melih Tutuncu c, Tarkan Aydin a, M. Erdem Isenkul d, Hulya Apaydin c a Department…
0 runs0 likes0 downloads0 reach2 impact
756 instances - 754 features - 0 classes - 0 missing values
nominal features and target for COMPAS
0 runs0 likes0 downloads0 reach2 impact
5278 instances - 14 features - 2 classes - 0 missing values
This dataset contains all Premier League matches, with player statistic take from Sofifa, from 2008 to 2016
0 runs0 likes0 downloads0 reach1 impact
2961 instances - 17 features - classes - 0 missing values
This dataset contains, for each Premier League matches 2014-2015, the probabilities generated with the L2F models, as well as matches odds.
0 runs0 likes0 downloads0 reach1 impact
323 instances - 11 features - classes - 0 missing values
This dataset contains all the player names and player ids, taken from Sofifa
0 runs0 likes0 downloads0 reach1 impact
11009 instances - 3 features - classes - 0 missing values
This dataset contains a simulation of the Lorenz attractor with the parameter $\rho$ varying in time. The stable and chaotic regimes alternate.
0 runs0 likes0 downloads0 reach1 impact
4942 instances - 4 features - classes - 0 missing values
Dataset sales
0 runs0 likes0 downloads0 reach3 impact
10738 instances - 15 features - 0 classes - 0 missing values
Test file for ML training
0 runs0 likes0 downloads0 reach2 impact
1599 instances - 12 features - classes - 0 missing values
Iris DataSet
0 runs0 likes1 downloads1 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
Premier league matches from 2008 to 2014 with TDA features extracted.
0 runs0 likes0 downloads0 reach1 impact
2565 instances - 20 features - classes - 0 missing values
Embedding of molecules bonds in HIV inhibitors dataset
0 runs0 likes0 downloads0 reach0 impact
1151940 instances - 30 features - classes - 0 missing values
Fixed dataset for autoHorse.csv I suggest...
0 runs0 likes0 downloads0 reach1 impact
201 instances - 69 features - 186 classes - 0 missing values
price col is int now. autoHorse dataset
11 runs0 likes0 downloads0 reach1 impact
201 instances - 69 features - 0 classes - 0 missing values
testing
0 runs0 likes0 downloads0 reach0 impact
366 instances - 3 features - classes - 0 missing values
test001
0 runs1 likes0 downloads1 reach1 impact
768 instances - 9 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
Binarized version of the USPS dataset (see version 2). Only instances with class labels 6 and 9 from the original dataset are considered and encoded as 0 (original class 6) and 1 (original class 9).
0 runs0 likes0 downloads0 reach2 impact
1424 instances - 257 features - 2 classes - 0 missing values
Binarized version of the isolet dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach3 impact
600 instances - 618 features - 2 classes - 0 missing values
Binarized version of the cnae-9 dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach2 impact
240 instances - 857 features - 2 classes - 0 missing values
testtest
0 runs0 likes0 downloads0 reach1 impact
1994 instances - 127 features - 0 classes - 0 missing values
Binarized version of the semeion dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach2 impact
319 instances - 257 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach1 impact
156 instances - 81 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach1 impact
156 instances - 91 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach1 impact
156 instances - 81 features - 2 classes - 0 missing values
source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) translator…
14 runs0 likes0 downloads0 reach1 impact
527 instances - 23 features - 4 classes - 0 missing values
exercises
0 runs0 likes0 downloads0 reach1 impact
15000 instances - 8 features - classes - 0 missing values
source: http://plato.asu.edu/ftp/solvable.html authors: Rolf-David Bergdoll PAR10 performances of modern solvers on the solvable instances of MIPLIB2010. http://miplib.zib.de/ The algorithm runtime…
0 runs0 likes1 downloads1 reach1 impact
1090 instances - 145 features - 0 classes - 0 missing values
source: http://plato.asu.edu/ftp/solvable.html authors: Rolf-David Bergdoll PAR10 performances of modern solvers on the solvable instances of MIPLIB2010. http://miplib.zib.de/ The algorithm runtime…
0 runs0 likes0 downloads0 reach1 impact
218 instances - 144 features - 5 classes - 0 missing values
# Data Description This is the historical price data of the FOREX USD/DKK from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/CAD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/SGD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/CHF from Dukascopy. One instance (row) is one candlestick of one day. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes1 downloads1 reach1 impact
1833 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/HUF from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes1 downloads1 reach1 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/SEK from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX USD/DKK from Dukascopy. One instance (row) is one candlestick of one day. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
1832 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX AUD/NZD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes2 downloads2 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
Data set of around 45 language and 25 Category. Consist of articles.
0 runs0 likes0 downloads0 reach1 impact
65428 instances - 3 features - classes - 0 missing values
exercises
0 runs0 likes0 downloads0 reach1 impact
15000 instances - 8 features - classes - 0 missing values
The ILPD dataset from the OpenCC18 with all categorical variables label encoded
0 runs0 likes0 downloads0 reach1 impact
583 instances - 11 features - 0 classes - 0 missing values
The sick dataset from the OpenCC18 with all categorical data label encoded so all data is numeric
0 runs0 likes0 downloads0 reach1 impact
3772 instances - 30 features - classes - 0 missing values
The ILPD liver dataset from the OpenCC18 with the gender binary encoded so all features are numeric
1 runs0 likes0 downloads0 reach2 impact
583 instances - 11 features - 2 classes - 0 missing values
Sick dataset from the opencc18 with all textual binary variables label encoded.
1 runs0 likes0 downloads0 reach2 impact
3772 instances - 30 features - 2 classes - 0 missing values
Elegibilidade ecommerce
0 runs0 likes1 downloads1 reach1 impact
269177 instances - 2 features - 2 classes - 0 missing values
test openml upload
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) translator…
0 runs0 likes1 downloads1 reach1 impact
2108 instances - 24 features - 0 classes - 0 missing values
2
0 runs0 likes0 downloads0 reach1 impact
375840 instances - 12 features - classes - 0 missing values
Branin test
0 runs0 likes0 downloads0 reach1 impact
225 instances - 3 features - classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs1 likes10 downloads11 reach4 impact
2000 instances - 140 features - 2 classes - 0 missing values
eating
9413 runs0 likes16 downloads16 reach44 impact
945 instances - 6374 features - 7 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10502, and it has 1627 rows and 1026 features…
1 runs0 likes2 downloads2 reach4 impact
1627 instances - 1026 features - 0 classes - 0 missing values
This is the poker dataset, retrieved 2013-11-14 from the libSVM site. Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows: -join test and…
23 runs0 likes18 downloads18 reach8 impact
1025010 instances - 11 features - 2 classes - 0 missing values
libSVM","AAD group Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biology, 96:6745-6750, 1999. #Dataset from…
0 runs0 likes4 downloads4 reach6 impact
62 instances - 2001 features - 0 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes3 downloads3 reach8 impact
203 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach7 impact
138 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
76 runs0 likes5 downloads5 reach8 impact
187 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach8 impact
185 instances - 10937 features - 2 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. All datasets contain between 100 and 400 samples, characterized by values of 20,000 - 65,000 attributes. Samples are assigned to several (2-10) classes.…
48 runs0 likes6 downloads6 reach8 impact
159 instances - 61360 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
78 runs0 likes4 downloads4 reach8 impact
421 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach8 impact
410 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach8 impact
470 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes3 downloads3 reach8 impact
412 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes3 downloads3 reach8 impact
201 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes5 downloads5 reach8 impact
250 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
79 runs0 likes3 downloads3 reach8 impact
322 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes5 downloads5 reach8 impact
275 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
72 runs1 likes7 downloads8 reach9 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2862 runs0 likes8 downloads8 reach17 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach8 impact
468 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes4 downloads4 reach8 impact
484 instances - 10937 features - 2 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Vikas Sindhwani for the SVMlin project.
0 runs0 likes3 downloads3 reach6 impact
72309 instances - 20959 features - 0 classes - 0 missing values
DOROTHEA is a drug discovery dataset. Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the…
0 runs0 likes7 downloads7 reach13 impact
1150 instances - 100001 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach8 impact
100 instances - 10001 features - 2 classes - 0 missing values
ARCENE's task is to distinguish cancer versus normal patterns from mass-spectrometric data. This is a two-class classification problem with continuous input variables. This dataset is one of 5…
17 runs0 likes10 downloads10 reach7 impact
200 instances - 10001 features - 2 classes - 0 missing values
Even smaller sample of version 1
0 runs0 likes3 downloads3 reach5 impact
149639 instances - 12 features - 2 classes - 0 missing values