OpenML
Filter results by:
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
diabetes
0 runs0 likes0 downloads0 reach6 impact
768 instances - 9 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach6 impact
891 instances - 12 features - classes - 866 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
Dataset from Smoothing Methods in Statistics (ftp stat.cmu.edu/datasets) Simonoff, J.S. (1996). Smoothing Methods in Statistics. New York: Springer-Verlag. Electicity usage is being treated as the…
4 runs0 likes0 downloads0 reach9 impact
55 instances - 3 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
14 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
30 instances - 1143 features - 0 classes - 0 missing values
Human Development Index [DATA] United Nations Development Program compiled an Index of Human Development. Column 1: Country(character) 2: Index 3: GNP GNP PER CAPITA RANK RANK - RANK HDI 1987 GNP RANK…
2 runs0 likes0 downloads0 reach13 impact
130 instances - 2 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach13 impact
74 instances - 9 features - 0 classes - 0 missing values
One of two multivariate regression data sets from paper industry, from an experiment at the paper plant Saugbruksforeningen, Norway. They have been described and analysed in: Aldrin, M. (1996),…
0 runs0 likes0 downloads0 reach13 impact
30 instances - 41 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach13 impact
163 instances - 6 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach13 impact
475 instances - 4 features - 0 classes - 0 missing values
%%%%%%%%%%%%%%%%%%% Data-Description % %%%%%%%%%%%%%%%%%%% COIL 1999 Competition Data Data Type multivariate Abstract This data set is from the 1999 Computational Intelligence and Learning (COIL)…
0 runs0 likes0 downloads0 reach13 impact
316 instances - 12 features - 0 classes - 56 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes0 downloads0 reach13 impact
70 instances - 4 features - 0 classes - 0 missing values
Following are data on the shooting of Vinnie Johnson of the Detroit Pistons during the 1985-1986 through 1988-1989 seasons. Source was the New York Times. The data are analyzed in the Carnegie Mellon…
0 runs0 likes0 downloads0 reach13 impact
380 instances - 3 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach14 impact
66 instances - 12 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
34 instances - 1143 features - 0 classes - 0 missing values
File README ----------- smoothmeth A collection of the data sets used in the book "Smoothing Methods in Statistics," by Jeffrey S. Simonoff, Springer-Verlag, New York, 1996. Submitted by Jeff Simonoff…
0 runs0 likes0 downloads0 reach14 impact
2178 instances - 4 features - 0 classes - 0 missing values
%%%%%%%%%%%%%%%%%%% Data-Description % %%%%%%%%%%%%%%%%%%% COIL 1999 Competition Data Data Type multivariate Abstract This data set is from the 1999 Computational Intelligence and Learning (COIL)…
12 runs0 likes0 downloads0 reach13 impact
316 instances - 12 features - 0 classes - 56 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
0 runs0 likes0 downloads0 reach11 impact
2796 instances - 35 features - 2 classes - 68100 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 1. Title: Assessing the Reliability of a Human Estimator…
0 runs0 likes0 downloads0 reach13 impact
75 instances - 15 features - 0 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/SEK from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach8 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX AUD/JPY from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach8 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX USD/JPY from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach8 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/PLN from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach8 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/CAD from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach8 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX CHF/SGD from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach8 impact
375840 instances - 12 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) translator…
20 runs0 likes0 downloads0 reach10 impact
2108 instances - 27 features - 0 classes - 0 missing values
analysis of stocks
0 runs0 likes0 downloads0 reach8 impact
245 instances - 15 features - classes - 0 missing values
This dataset is an artificial simulation of the Duffing system with random changes from the chaotic to the non-chaotic regime at different noise levels.
0 runs0 likes0 downloads0 reach8 impact
2493200 instances - 26 features - classes - 0 missing values
This dataset is an artificial simulation of the Duffing system with one phase transition to the chaotic regime.
0 runs0 likes0 downloads0 reach8 impact
9983 instances - 4 features - classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
2 runs0 likes0 downloads0 reach12 impact
595212 instances - 38 features - 2 classes - 846458 missing values
Hourly particulate matter air polution data of Great Britain for the year 2017, provided by Ricardo Energy and Environment on behalf of the UK Department for Environment, Food and Rural Affairs…
0 runs0 likes0 downloads0 reach8 impact
394299 instances - 10 features - 0 classes - 0 missing values
Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) [http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml]. The dataset includes TLC trips of the green line in…
0 runs0 likes0 downloads0 reach9 impact
581835 instances - 15 features - 0 classes - 0 missing values
#modelage
31 runs0 likes0 downloads0 reach8 impact
202 instances - 20 features - 2 classes - 17 missing values
Source: C. Okan Sakar a, Gorkem Serbes b, Aysegul Gunduz c, Hunkar C. Tunc a, Hatice Nizam d, Betul Erdogdu Sakar e, Melih Tutuncu c, Tarkan Aydin a, M. Erdem Isenkul d, Hulya Apaydin c a Department…
0 runs0 likes0 downloads0 reach12 impact
756 instances - 754 features - 0 classes - 0 missing values
1. Title: Echocardiogram Data 2. Source Information: -- Donor: Steven Salzberg (salzberg@cs.jhu.edu) -- Collector: -- Dr. Evlin Kinney -- The Reed Institute -- P.O. Box 402603 -- Maimi, FL 33140-0603…
0 runs0 likes0 downloads0 reach8 impact
132 instances - 8 features - 4 classes - 103 missing values
Regroups information for about 7800 different US colleges. Including geographical information, stats about the population attending and post graduation career earnings.
0 runs0 likes0 downloads0 reach8 impact
This dataset reflects incidents of crime in the City of Los Angeles dating back to 2010. This data is transcribed from original crime reports that are typed on paper and therefore there may be some…
0 runs0 likes0 downloads0 reach8 impact
Public procurement data for the European Economic Area, Switzerland, and the Macedonia. 2015
0 runs0 likes0 downloads0 reach8 impact
10% stratified subsample of the original SVHN data
0 runs0 likes0 downloads0 reach9 impact
9927 instances - 3073 features - 10 classes - 0 missing values
#modelage
28 runs0 likes0 downloads0 reach8 impact
202 instances - 13 features - 3 classes - 202 missing values
#modelage
87 runs0 likes0 downloads0 reach8 impact
224 instances - 20 features - 6 classes - 205 missing values
50% stratified subsample of the original SVHN data
0 runs0 likes0 downloads0 reach9 impact
49644 instances - 3073 features - 10 classes - 0 missing values
nfl_games
0 runs0 likes0 downloads0 reach8 impact
16274 instances - 12 features - classes - 0 missing values
The dataset contains the serie a matches for season 2015-2016
0 runs0 likes0 downloads0 reach8 impact
379 instances - 38 features - classes - 44 missing values
This dataset contains all Premier League matches, with player statistic take from Sofifa, from 2008 to 2016
0 runs0 likes0 downloads0 reach8 impact
2961 instances - 17 features - classes - 0 missing values
This dataset contains, for each Premier League matches 2014-2015, the probabilities generated with the L2F models, as well as matches odds.
0 runs0 likes0 downloads0 reach8 impact
323 instances - 11 features - classes - 0 missing values
This dataset contains all the player names and player ids, taken from Sofifa
0 runs0 likes0 downloads0 reach8 impact
11009 instances - 3 features - classes - 0 missing values
dataset for feature extraction
0 runs0 likes0 downloads0 reach8 impact
69 instances - 37 features - classes - 0 missing values
This dataset reflects incidents of crime in the City of Los Angeles dating back to 2010. This data is transcribed from original crime reports that are typed on paper and therefore there may be some…
0 runs0 likes0 downloads0 reach8 impact
1468825 instances - 26 features - 0 classes - 7881776 missing values
This dataset contains a simulation of the Lorenz attractor with the parameter $\rho$ varying in time. The stable and chaotic regimes alternate.
0 runs0 likes0 downloads0 reach8 impact
4942 instances - 4 features - classes - 0 missing values
Dataset sales
0 runs0 likes0 downloads0 reach10 impact
10738 instances - 15 features - 0 classes - 0 missing values
Test file for ML training
0 runs0 likes0 downloads0 reach9 impact
1599 instances - 12 features - classes - 0 missing values
Premier league matches from 2008 to 2014 with TDA features extracted.
0 runs0 likes0 downloads0 reach8 impact
2565 instances - 20 features - classes - 0 missing values
Embedding of atoms for HIV inhibitors dataser
0 runs0 likes0 downloads0 reach7 impact
1069964 instances - 30 features - classes - 0 missing values
Embedding of molecules bonds in HIV inhibitors dataset
0 runs0 likes0 downloads0 reach7 impact
1151940 instances - 30 features - classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
13 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
11 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
12 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
7 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
22 instances - 629 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
79 instances - 321 features - 0 classes - 0 missing values
Data from StatLib (ftp stat.cmu.edu/datasets) This is the data set called `DETROIT' in the book `Subset selection in regression' by Alan J. Miller published in the Chapman & Hall series of monographs…
2 runs0 likes0 downloads0 reach10 impact
13 instances - 14 features - 0 classes - 0 missing values
Graeme D. Hutcheson and Nick Sofroniou 1999 The Multivariate Social Scientist: Introductory Statistics Using Generalized Linear Models. SAGE Publications. Copyright: Graeme D. Hutcheson & Nick…
0 runs0 likes0 downloads0 reach13 impact
42 instances - 16 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach13 impact
400 instances - 8 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
274 instances - 1143 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! All nominal attributes and instances with missing values are deleted. Price treated as the class attribute. As used by…
2 runs0 likes0 downloads0 reach12 impact
159 instances - 16 features - 0 classes - 0 missing values
Relationship between IQ and Brain Size Summary: Monozygotic twins share numerous physical, psychological, and pathological traits. Recent advances in in vivo brain image acquisition and analysis have…
0 runs0 likes0 downloads0 reach13 impact
20 instances - 9 features - 0 classes - 0 missing values
One of the data sets used in the book "Analyzing Categorical Data" by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. Further details concerning the book, including information on statistical…
0 runs0 likes0 downloads0 reach13 impact
30 instances - 7 features - 0 classes - 6 missing values
A shar archive of data from the book Data Analysis: An Introduction(1992) Prentice Hall bu Jeff Witmer. Submitted by Jeff Witmer (fwitmer@ocvaxa.cc.oberlin.edu) [28/Jun/94] (29 kbytes) Note:…
2 runs0 likes0 downloads0 reach13 impact
50 instances - 5 features - 0 classes - 0 missing values
Data Sets for 'Regression Models for Time Series Analysis' by B. Kedem and K. Fokianos, Wiley 2002. Submitted by Kostas Fokianos (fokianos@ucy.ac.cy) [8/Nov/02] (176k) Note: - attribute names were…
2 runs0 likes0 downloads0 reach13 impact
264 instances - 3 features - 0 classes - 0 missing values
%%%%%%%%%%%%%%%%%%% Data-Description % %%%%%%%%%%%%%%%%%%% COIL 1999 Competition Data Data Type multivariate Abstract This data set is from the 1999 Computational Intelligence and Learning (COIL)…
0 runs0 likes0 downloads0 reach13 impact
316 instances - 12 features - 0 classes - 56 missing values
This file contains the data in "The MU284 Population" from Appendix B of the book "Model Assisted Survey Sampling" by Sarndal, Swensson and Wretman, published by Springer-Verlag, New York, 1992. The…
0 runs0 likes0 downloads0 reach13 impact
284 instances - 10 features - 0 classes - 0 missing values
Dataset from Smoothing Methods in Statistics (ftp stat.cmu.edu/datasets) Simonoff, J.S. (1996). Smoothing Methods in Statistics. New York: Springer-Verlag. Points scored per minute is being treated as…
2 runs0 likes0 downloads0 reach9 impact
96 instances - 5 features - 0 classes - 0 missing values
Contains 110 data sets from the book 'The Statistical Sleuth' by Fred Ramsey and Dan Schafer; Duxbury Press, 1997. (schafer@stat.orst.edu) [14/Oct/97] (172k) Note: description taken from this web…
2 runs0 likes0 downloads0 reach13 impact
147 instances - 7 features - 0 classes - 0 missing values
DATA FILE: Data on patient deaths within 30 days of surgery in 131 U.S. hospitals. See Christiansen and Morris, Bayesian Biostatistics, D. Berry and D. Stangl, editors, 1996, Marcel Dekker, Inc. Data…
0 runs0 likes0 downloads0 reach13 impact
131 instances - 3 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach13 impact
60 instances - 11 features - 0 classes - 14 missing values
Contains 110 data sets from the book 'The Statistical Sleuth' by Fred Ramsey and Dan Schafer; Duxbury Press, 1997. (schafer@stat.orst.edu) [14/Oct/97] (172k) Note: description taken from this web…
2 runs0 likes0 downloads0 reach13 impact
93 instances - 7 features - 0 classes - 0 missing values
These data are estimated correlations between daily 3 p.m. wind measurements during September and October 1997 for a network of 45 stations in the Sydney region. The first column below gives a list of…
0 runs0 likes0 downloads0 reach11 impact
45 instances - 47 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach9 impact
1000000 instances - 17 features - classes - 0 missing values
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the…
1 runs0 likes0 downloads0 reach12 impact
270912 instances - 785 features - 49 classes - 0 missing values