OpenML
Filter results by:
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Solar Flare dataset (Lichman 2013) has 3 target variables that correspond to the number of…
0 runs0 likes0 downloads0 reach2 impact
1066 instances - 13 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Concrete Slump dataset (Yeh 2007) concerns the prediction of three properties of concrete…
0 runs1 likes0 downloads1 reach2 impact
103 instances - 10 features - classes - 0 missing values
The YouTube personality dataset consists of a collection of behavorial features, speech transcriptions, and personality impression scores for a set of 404 YouTube vloggers that explicitly show…
0 runs0 likes0 downloads0 reach2 impact
404 instances - 31 features - classes - 0 missing values
Domain dataset
0 runs0 likes0 downloads0 reach2 impact
1637 instances - 9839 features - 3 classes - 13231887 missing values
Data set shows information about participants of math conference. isPresent is target column for classification task.
0 runs0 likes0 downloads0 reach2 impact
246 instances - 7 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach2 impact
14 instances - 5 features - 2 classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
Source: C. Okan Sakar a, Gorkem Serbes b, Aysegul Gunduz c, Hunkar C. Tunc a, Hatice Nizam d, Betul Erdogdu Sakar e, Melih Tutuncu c, Tarkan Aydin a, M. Erdem Isenkul d, Hulya Apaydin c a Department…
0 runs0 likes0 downloads0 reach2 impact
756 instances - 754 features - 0 classes - 0 missing values
nominal features and target for COMPAS
0 runs0 likes0 downloads0 reach2 impact
5278 instances - 14 features - 2 classes - 0 missing values
Test file for ML training
0 runs0 likes0 downloads0 reach2 impact
1599 instances - 12 features - classes - 0 missing values
Iris DataSet
0 runs0 likes1 downloads1 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
Binarized version of the USPS dataset (see version 2). Only instances with class labels 6 and 9 from the original dataset are considered and encoded as 0 (original class 6) and 1 (original class 9).
0 runs0 likes0 downloads0 reach2 impact
1424 instances - 257 features - 2 classes - 0 missing values
Binarized version of the cnae-9 dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach2 impact
240 instances - 857 features - 2 classes - 0 missing values
Binarized version of the semeion dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach2 impact
319 instances - 257 features - 2 classes - 0 missing values
The ILPD liver dataset from the OpenCC18 with the gender binary encoded so all features are numeric
1 runs0 likes0 downloads0 reach2 impact
583 instances - 11 features - 2 classes - 0 missing values
Sick dataset from the opencc18 with all textual binary variables label encoded.
1 runs0 likes0 downloads0 reach2 impact
3772 instances - 30 features - 2 classes - 0 missing values
test openml upload
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
Dataset from Smoothing Methods in Statistics (ftp stat.cmu.edu/datasets) Simonoff, J.S. (1996). Smoothing Methods in Statistics. New York: Springer-Verlag.
4 runs0 likes2 downloads2 reach2 impact
61 instances - 3 features - 0 classes - 0 missing values
No data.
353 runs0 likes17 downloads17 reach2 impact
120919 instances - 1002 features - 2 classes - 0 missing values
Juan J. Rodriguez, Ludmila I. Kuncheva, Carlos J. Alonso (2006). Rotation Forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(10):1619-1630.…
0 runs0 likes0 downloads0 reach2 impact
1000000 instances - 12 features - 0 classes - 0 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) Database of surgeries on horses. Possible class attributes: 24 (whether lesion is surgical), others include: 23, 25, 26, and 27 Notes: * Hospital_Number…
236 runs0 likes9 downloads9 reach2 impact
368 instances - 27 features - 2 classes - 1927 missing values
Citation Request: This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
2007 runs1 likes34 downloads35 reach2 impact
286 instances - 10 features - 2 classes - 9 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) In this version (version 2), some features were removed. It is unclear why of how this was done.
1883 runs0 likes9 downloads9 reach2 impact
368 instances - 23 features - 2 classes - 1927 missing values
1. Title: Contraceptive Method Choice 2. Sources: (a) Origin: This dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey (b) Creator: Tjen-Sien Lim (limt@stat.wisc.edu)…
23427 runs0 likes19 downloads19 reach2 impact
1473 instances - 10 features - 3 classes - 0 missing values
Current dataset was adapted to ARFF format from the UCI version. Sample code ID's were removed. ! Note that there is also a related Breast Cancer Wisconsin (Diagnosis) Data Set with a different set of…
25520 runs1 likes20 downloads21 reach2 impact
699 instances - 10 features - 2 classes - 16 missing values
1. Title of Database: Abalone data 2. Sources: (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of Primary Industry and Fisheries, Tasmania…
34899 runs0 likes18 downloads18 reach2 impact
4177 instances - 9 features - 28 classes - 0 missing values
No data.
1777 runs0 likes15 downloads15 reach2 impact
28056 instances - 7 features - 18 classes - 0 missing values
No data.
965 runs0 likes9 downloads9 reach2 impact
55296 instances - 10 features - 3 classes - 0 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
26698 runs0 likes10 downloads10 reach2 impact
736 instances - 20 features - 5 classes - 448 missing values
No data.
867 runs0 likes11 downloads11 reach2 impact
39366 instances - 10 features - 2 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
1187 runs1 likes10 downloads11 reach2 impact
412 instances - 9 features - 7 classes - 96 missing values
No data.
2198 runs1 likes16 downloads17 reach2 impact
1484 instances - 9 features - 10 classes - 0 missing values
1. Title: Postoperative Patient Data 2. Source Information: -- Creators: Sharon Summers, School of Nursing, University of Kansas Medical Center, Kansas City, KS 66160 Linda Woolery, School of Nursing,…
1758 runs0 likes10 downloads10 reach2 impact
90 instances - 9 features - 3 classes - 3 missing values
1. Title: Glass Identification Database 2. Sources: (a) Creator: B. German -- Central Research Establishment Home Office Forensic Science Service Aldermaston, Reading, Berkshire RG7 4PN (b) Donor:…
1776 runs0 likes50 downloads50 reach2 impact
214 instances - 10 features - 6 classes - 0 missing values
Primate splice-junction gene sequences (DNA) with associated imperfect domain theory. Splice junctions are points on a DNA sequence at which 'superfluous' DNA is removed during the process of protein…
23161 runs1 likes15 downloads16 reach2 impact
3190 instances - 61 features - 3 classes - 0 missing values
1. Title: Teaching Assistant Evaluation 2. Sources: (a) Collector: Wei-Yin Loh (Department of Statistics, UW-Madison) (b) Donor: Tjen-Sien Lim (limt@stat.wisc.edu) (b) Date: June 7, 1997 3. Past…
2028 runs0 likes13 downloads13 reach2 impact
151 instances - 6 features - 3 classes - 0 missing values
This database encodes the complete set of possible board configurations at the end of tic-tac-toe games, where "x" is assumed to have played first. The target concept is "win for x" (i.e., true when…
385613 runs1 likes66 downloads67 reach2 impact
958 instances - 10 features - 2 classes - 0 missing values
Publication Request: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This file describes the contents of the heart-disease directory. This directory contains 4 databases…
1789 runs0 likes10 downloads10 reach2 impact
294 instances - 14 features - 2 classes - 782 missing values
Attribute information: ``` sick, negative. | classes age: continuous. sex: M, F. on thyroxine: f, t. query on thyroxine: f, t. on antithyroid medication: f, t. sick: f, t. pregnant: f, t. thyroid…
19175 runs0 likes31 downloads31 reach2 impact
3772 instances - 30 features - 2 classes - 6064 missing values
NAME: Sonar, Mines vs. Rocks SUMMARY: This is the data set used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network [1]. The task is to train a network…
2366 runs1 likes25 downloads26 reach2 impact
208 instances - 61 features - 2 classes - 0 missing values
1. Title: Haberman's Survival Data 2. Sources: (a) Donor: Tjen-Sien Lim (limt@stat.wisc.edu) (b) Date: March 4, 1999 3. Past Usage: 1. Haberman, S. J. (1976). Generalized Residuals for Log-Linear…
3241 runs1 likes19 downloads20 reach2 impact
306 instances - 4 features - 2 classes - 0 missing values
Publication Request: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This file describes the contents of the heart-disease directory. This directory contains 4 databases…
1763 runs0 likes10 downloads10 reach2 impact
303 instances - 14 features - 2 classes - 7 missing values
1. Title: Hepatitis Domain 2. Sources: (a) unknown (b) Donor: G.Gong (Carnegie-Mellon University) via Bojan Cestnik Jozef Stefan Institute Jamova 39 61000 Ljubljana Yugoslavia (tel.: (38)(+61) 214-399…
2134 runs1 likes12 downloads13 reach2 impact
155 instances - 20 features - 2 classes - 167 missing values
1. Title: 1984 United States Congressional Voting Records Database 2. Source Information: (a) Source: Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional…
2262 runs0 likes17 downloads17 reach2 impact
435 instances - 17 features - 2 classes - 392 missing values
A simple database containing 17 Boolean-valued attributes describing animals. The "type" attribute appears to be the class attribute. Notes: * I find it unusual that there are 2 instances of "frog"…
168 runs2 likes17 downloads19 reach2 impact
101 instances - 17 features - 7 classes - 0 missing values
No data.
1038 runs0 likes8 downloads8 reach2 impact
55296 instances - 10 features - 3 classes - 0 missing values
; ; Thyroid disease records supplied by the Garavan Institute and J. Ross ; Quinlan, New South Wales Institute, Syndney, Australia. ; ; 1987. ; hypothyroid, primary hypothyroid, compensated…
883 runs0 likes11 downloads11 reach2 impact
3772 instances - 30 features - 4 classes - 6064 missing values
No data.
1457 runs0 likes12 downloads12 reach2 impact
39366 instances - 10 features - 2 classes - 0 missing values
1. Title: Space Shuttle Autolanding Domain 2. Sources: (a) Original source: unknown -- NASA: Mr. Roger Burke's autolander design team (b) Donor: Bojan Cestnik Jozef Stefan Institute Jamova 39 61000…
1466 runs0 likes9 downloads9 reach2 impact
15 instances - 7 features - 2 classes - 26 missing values
This is a commercial application described in Weiss & Indurkhya (1995). The data describes a telecommunication problem. No further information is available. Characteristics: (10000+5000) cases, 49…
2 runs0 likes4 downloads4 reach2 impact
15000 instances - 49 features - 0 classes - 0 missing values
This data set is also obtained from the task of controlling a F16 aircraft, although the target variable and attributes are different from the ailerons domain. In this case the goal variable is…
2 runs0 likes7 downloads7 reach2 impact
16599 instances - 19 features - 0 classes - 0 missing values
The task consists of Learning Quantitative Structure Activity Relationships (QSARs). The Inhibition of Dihydrofolate Reductase by Pyrimidines.The data are described in: King, Ross .D., Muggleton,…
6 runs0 likes2 downloads2 reach2 impact
74 instances - 28 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identification code deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based…
4 runs1 likes0 downloads1 reach2 impact
189 instances - 10 features - 0 classes - 0 missing values
This is a dataset obtained from the StatLib repository. Here is the included description: The data provided are daily stock prices from January 1988 through October 1991, for ten aerospace companies.…
5 runs1 likes7 downloads8 reach2 impact
950 instances - 10 features - 0 classes - 0 missing values
No data.
0 runs0 likes3 downloads3 reach2 impact
31104 instances - 10 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach3 impact
185 instances - 2 features - classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach3 impact
50 instances - 3 features - classes - 0 missing values
No data.
312 runs0 likes4 downloads4 reach3 impact
1000000 instances - 14 features - 3 classes - 0 missing values
No data.
37 runs0 likes2 downloads2 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
31 runs0 likes1 downloads1 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
9 runs0 likes2 downloads2 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
10 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
9 runs0 likes2 downloads2 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
10 runs0 likes2 downloads2 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
7 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
6 runs0 likes3 downloads3 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
7 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
6 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
7 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
30 runs0 likes2 downloads2 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
27 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
27 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
This is the hip measurement data from Table B.13 in Chatfield's Problem Solving (1995, 2nd edn, Chapman and Hall). It is given in 8 columns. First 4 columns are for Control Group. Last 4 columns are…
0 runs0 likes0 downloads0 reach3 impact
54 instances - 8 features - classes - 120 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes2 downloads2 reach3 impact
100 instances - 10 features - classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes1 downloads1 reach3 impact
228 instances - 8 features - classes - 20 missing values
One of the data sets used in the book "Analyzing Categorical Data" by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. Further details concerning the book, including information on statistical…
0 runs0 likes1 downloads1 reach3 impact
31 instances - 16 features - classes - 150 missing values
These data are estimated correlations between daily 3 p.m. wind measurements during September and October 1997 for a network of 45 stations in the Sydney region. The first column below gives a list of…
0 runs0 likes0 downloads0 reach3 impact
45 instances - 47 features - classes - 0 missing values
This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). *…
0 runs0 likes3 downloads3 reach3 impact
39644 instances - 61 features - 0 classes - 0 missing values
USDA, NRCS. 2008. The PLANTS Database ([Web Link], 31 December 2008). National Plant Data Center, Baton Rouge, LA 70874-4490 USA. Abstract: Data has been extracted from the USDA plants database. It…
0 runs0 likes4 downloads4 reach3 impact
Source: Creators : François Kawala (1,2) Ahlame Douzal (1) Eric Gaussier (1) Eustache Diemert (2) Institutions : (1) Université Joseph Fourier (Grenoble I) Laboratoire d'informatique de…
0 runs0 likes1 downloads1 reach3 impact
28179 instances - 97 features - classes - 0 missing values