OpenML
Filter results by:
1. Title: Nursery Database 2. Sources: (a) Creator: Vladislav Rajkovic et al. (13 experts) (b) Donors: Marko Bohanec (marko.bohanec@ijs.si) Blaz Zupan (blaz.zupan@ijs.si) (c) Date: June, 1997 3. Past…
2210 runs0 likes15 downloads15 reach2 impact
12960 instances - 9 features - 5 classes - 0 missing values
No data.
2193 runs0 likes15 downloads15 reach1 impact
1484 instances - 9 features - 10 classes - 0 missing values
No data.
1777 runs0 likes15 downloads15 reach1 impact
28056 instances - 7 features - 18 classes - 0 missing values
No data.
426 runs0 likes15 downloads15 reach76 impact
2463 instances - 2001 features - 17 classes - 0 missing values
Data on educational transitions for a sample of 500 Irish schoolchildren aged 11 in 1967. The data were collected by Greaney and Kelleghan (1984), and reanalyzed by Raftery and Hout (1985, 1993). ###…
16028 runs0 likes15 downloads15 reach17 impact
500 instances - 6 features - 2 classes - 32 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
652 runs0 likes15 downloads15 reach7 impact
12960 instances - 9 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
1133 runs0 likes15 downloads15 reach10 impact
150 instances - 5 features - 2 classes - 0 missing values
eating
9413 runs0 likes15 downloads15 reach43 impact
945 instances - 6374 features - 7 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
12711 runs0 likes15 downloads15 reach25 impact
48842 instances - 15 features - 2 classes - 6465 missing values
The KDD Cup 2009 offers the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn). Churn (wikipedia…
10986 runs0 likes15 downloads15 reach17 impact
50000 instances - 231 features - 2 classes - 8024152 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
396 runs0 likes15 downloads15 reach7 impact
3468 instances - 785 features - 10 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
108852 runs0 likes15 downloads15 reach18 impact
1458 instances - 38 features - 2 classes - 0 missing values
An artificial data set where instances belongs to several clusters with a banana shape. There are two attributes At1 and At2 corresponding to the x and y axis, respectively. The class label (-1 and 1)…
163 runs2 likes15 downloads17 reach6 impact
5300 instances - 3 features - 2 classes - 0 missing values
### Description This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories. ### Source ``` Patrick Marques…
25900 runs0 likes15 downloads15 reach46 impact
1080 instances - 857 features - 9 classes - 0 missing values
* Dataset: Reduced version (10 % of the examples) of bank-marketing dataset.
104 runs1 likes15 downloads16 reach7 impact
4521 instances - 17 features - 2 classes - 0 missing values
A simple database containing 17 Boolean-valued attributes describing animals. The "type" attribute appears to be the class attribute. Notes: * I find it unusual that there are 2 instances of "frog"…
168 runs2 likes15 downloads17 reach1 impact
101 instances - 18 features - 7 classes - 0 missing values
This data set was generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or be balanced. The attributes are…
23815 runs1 likes14 downloads15 reach3 impact
625 instances - 5 features - 3 classes - 0 missing values
Primate splice-junction gene sequences (DNA) with associated imperfect domain theory. Splice junctions are points on a DNA sequence at which 'superfluous' DNA is removed during the process of protein…
19521 runs1 likes14 downloads15 reach1 impact
3190 instances - 62 features - 3 classes - 0 missing values
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios. Collected by David Deterding (data and…
24479 runs0 likes14 downloads14 reach35 impact
990 instances - 13 features - 11 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
772 runs0 likes14 downloads14 reach7 impact
2310 instances - 20 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
802 runs0 likes14 downloads14 reach7 impact
3848 instances - 6 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
773 runs0 likes14 downloads14 reach7 impact
950 instances - 10 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
486 runs0 likes14 downloads14 reach8 impact
14395 instances - 109 features - 2 classes - 0 missing values
This is the original version of the famous covertype dataset in ARFF format. Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a…
2 runs1 likes14 downloads15 reach12 impact
581012 instances - 55 features - 7 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. The specific type of software is unknown. Data comes from McCabe and Halstead features extractors of source code. These features were defined in…
815 runs0 likes14 downloads14 reach9 impact
9466 instances - 39 features - 2 classes - 0 missing values
### Description MicroMass (pure spectra version) is a dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. ### Source ``` Pierre Mahé,…
39323 runs1 likes14 downloads15 reach89 impact
571 instances - 1301 features - 20 classes - 0 missing values
1. Data set title: Nomao Data Set 2. Abstract: Nomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place.…
65118 runs0 likes14 downloads14 reach18 impact
34465 instances - 119 features - 2 classes - 0 missing values
* Abstract: Oxford Parkinson's Disease Detection Dataset * Source: The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech,…
179 runs1 likes14 downloads15 reach6 impact
195 instances - 23 features - 2 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Margin). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
142733 runs1 likes14 downloads15 reach410 impact
1600 instances - 65 features - 100 classes - 0 missing values
Data on tree growth used in the Case Study published in the September, 1995 issue of the Canadian Journal of Statistics. This data set was been provided by Dr. Fernando Camacho, Ontario Hydro…
18462 runs1 likes14 downloads15 reach31 impact
2796 instances - 35 features - 6 classes - 68100 missing values
Source: Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi…
48930 runs0 likes14 downloads14 reach18 impact
11055 instances - 31 features - 2 classes - 0 missing values
Creators: Renata Cristina Barros Madeo (Madeo, R. C. B.) Priscilla Koch Wagner (Wagner, P. K.) Sarajane Marques Peres (Peres, S. M.) {renata.si, priscilla.wagner, sarajane} at usp.br…
21197 runs1 likes14 downloads15 reach29 impact
9873 instances - 33 features - 5 classes - 0 missing values
### Description The data consists of real historical data collected from 2010 & 2011. Employees are manually allowed or denied access to resources over time. The data is used to create an algorithm…
35315 runs0 likes14 downloads14 reach18 impact
32769 instances - 10 features - 2 classes - 0 missing values
Wikidata with top-474 most frequent types and ingoing/outgoing properties as features
0 runs0 likes14 downloads14 reach3 impact
19254100 instances - 2331 features - classes - 0 missing values
Citation Request: This primary tumor domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1261 runs0 likes14 downloads14 reach2 impact
339 instances - 18 features - 21 classes - 225 missing values
1. Title: Dermatology Database 2. Source Information: (a) Original owners: -- 1. Nilsel Ilter, M.D., Ph.D., Gazi University, School of Medicine 06510 Ankara, Turkey Phone: +90 (312) 214 1080 -- 2. H.…
1752 runs0 likes13 downloads13 reach2 impact
366 instances - 35 features - 6 classes - 8 missing values
No data.
163 runs0 likes13 downloads13 reach11 impact
1560 instances - 8461 features - 20 classes - 0 missing values
No data.
416 runs1 likes13 downloads14 reach52 impact
1050 instances - 3239 features - 10 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
108672 runs0 likes13 downloads13 reach25 impact
554 instances - 7 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
604 runs0 likes13 downloads13 reach7 impact
22784 instances - 17 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
761 runs0 likes13 downloads13 reach7 impact
8192 instances - 9 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
774 runs0 likes13 downloads13 reach7 impact
9517 instances - 7 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
746 runs0 likes13 downloads13 reach7 impact
1024 instances - 3 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
567 runs0 likes13 downloads13 reach7 impact
40768 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes13 downloads13 reach7 impact
1156 instances - 6 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
676 runs0 likes13 downloads13 reach7 impact
10992 instances - 17 features - 2 classes - 0 missing values
Balanced version of click prediction data
36 runs0 likes13 downloads13 reach5 impact
1997410 instances - 12 features - 2 classes - 0 missing values
No data.
794 runs1 likes13 downloads14 reach6 impact
107 instances - 30 features - 2 classes - 0 missing values
* Abstract: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. This is a…
423 runs0 likes13 downloads13 reach5 impact
120 instances - 7 features - 2 classes - 0 missing values
A 3-class version of Cardiotocography dataset.
134 runs0 likes13 downloads13 reach6 impact
2126 instances - 36 features - 3 classes - 0 missing values
Abstract: This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. Source: The data are submitted on behalf of the Center…
0 runs1 likes13 downloads14 reach6 impact
101766 instances - 50 features - 3 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 0.1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
63434 runs0 likes13 downloads13 reach17 impact
39948 instances - 12 features - 2 classes - 0 missing values
1. Title: Protein Localization Sites 2. Creator and Maintainer: Kenta Nakai Institue of Molecular and Cellular Biology Osaka, University 1-3 Yamada-oka, Suita 565 Japan nakai@imcb.osaka-u.ac.jp…
1803 runs0 likes12 downloads12 reach2 impact
336 instances - 8 features - 8 classes - 0 missing values
No data.
1457 runs0 likes12 downloads12 reach1 impact
39366 instances - 10 features - 2 classes - 0 missing values
1. Title: Teaching Assistant Evaluation 2. Sources: (a) Collector: Wei-Yin Loh (Department of Statistics, UW-Madison) (b) Donor: Tjen-Sien Lim (limt@stat.wisc.edu) (b) Date: June 7, 1997 3. Past…
2028 runs0 likes12 downloads12 reach1 impact
151 instances - 6 features - 3 classes - 0 missing values
No data.
428 runs0 likes12 downloads12 reach52 impact
1003 instances - 3183 features - 10 classes - 0 missing values
No data.
216 runs0 likes12 downloads12 reach52 impact
11162 instances - 11466 features - 10 classes - 0 missing values
SPECT heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Sources: --…
1296 runs1 likes12 downloads13 reach8 impact
267 instances - 23 features - 2 classes - 0 missing values
SPECTF heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. NOTE: See the…
1103 runs0 likes12 downloads12 reach7 impact
349 instances - 45 features - 2 classes - 0 missing values
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, VOL 286, pp. 531-537, 15 October 1999. Web supplement to the article T.R. Golub, D. K.…
451 runs0 likes12 downloads12 reach6 impact
72 instances - 7130 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
762 runs0 likes12 downloads12 reach7 impact
8192 instances - 33 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
602 runs0 likes12 downloads12 reach7 impact
13750 instances - 41 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
600 runs0 likes12 downloads12 reach7 impact
1000 instances - 101 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
747 runs0 likes12 downloads12 reach7 impact
4177 instances - 9 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
775 runs0 likes12 downloads12 reach7 impact
2178 instances - 4 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
769 runs0 likes12 downloads12 reach7 impact
252 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes12 downloads12 reach7 impact
8192 instances - 9 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
765 runs0 likes12 downloads12 reach7 impact
1728 instances - 7 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
781 runs0 likes12 downloads12 reach7 impact
5473 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
857 runs0 likes12 downloads12 reach8 impact
9961 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
639 runs0 likes12 downloads12 reach7 impact
20000 instances - 17 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
765 runs0 likes12 downloads12 reach7 impact
5620 instances - 65 features - 2 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element…
11305 runs0 likes12 downloads12 reach17 impact
50000 instances - 231 features - 2 classes - 8024152 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
875 runs0 likes12 downloads12 reach8 impact
5589 instances - 37 features - 2 classes - 0 missing values
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008. 1 . Abstract: Two ground ozone level data sets are included in…
182239 runs0 likes12 downloads12 reach18 impact
2534 instances - 73 features - 2 classes - 0 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes12 downloads12 reach3 impact
2407 instances - 300 features - 2 classes - 0 missing values
No data.
68 runs0 likes11 downloads11 reach1 impact
1000000 instances - 10 features - 2 classes - 0 missing values
1. Title: Hepatitis Domain 2. Sources: (a) unknown (b) Donor: G.Gong (Carnegie-Mellon University) via Bojan Cestnik Jozef Stefan Institute Jamova 39 61000 Ljubljana Yugoslavia (tel.: (38)(+61) 214-399…
2134 runs0 likes11 downloads11 reach1 impact
155 instances - 20 features - 2 classes - 167 missing values
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service…
216 runs0 likes11 downloads11 reach2 impact
110393 instances - 55 features - 7 classes - 0 missing values
Normalized version of the pokerhand data set. Automated file upload of pokerhand-normalized.arff
314 runs0 likes11 downloads11 reach2 impact
829201 instances - 11 features - 10 classes - 0 missing values
No data.
863 runs0 likes11 downloads11 reach1 impact
39366 instances - 10 features - 2 classes - 0 missing values
No data.
67 runs0 likes11 downloads11 reach11 impact
9558 instances - 26833 features - 44 classes - 0 missing values
No data.
159 runs0 likes11 downloads11 reach11 impact
1657 instances - 3759 features - 25 classes - 0 missing values
No data.
264 runs0 likes11 downloads11 reach36 impact
3204 instances - 13196 features - 6 classes - 0 missing values
Fast training of support vector machines using sequential minimal optimization. In Bernhard Schölkopf, Christopher J. C. Burges, and Alexander J. Smola, editors, Advances in Kernel Methods - Support…
564 runs0 likes11 downloads11 reach15 impact
36974 instances - 124 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
866 runs1 likes11 downloads12 reach8 impact
7129 instances - 6 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
773 runs0 likes11 downloads11 reach7 impact
8641 instances - 5 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
618 runs0 likes11 downloads11 reach7 impact
40768 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
617 runs0 likes11 downloads11 reach7 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
1266 runs0 likes11 downloads11 reach6 impact
131 instances - 4 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
1078 runs0 likes11 downloads11 reach6 impact
108 instances - 8 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
963 runs0 likes11 downloads11 reach7 impact
380 instances - 3 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
744 runs0 likes11 downloads11 reach7 impact
8192 instances - 33 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
589 runs0 likes11 downloads11 reach7 impact
22784 instances - 9 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
616 runs0 likes11 downloads11 reach7 impact
16599 instances - 19 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes11 downloads11 reach7 impact
6574 instances - 15 features - 2 classes - 0 missing values
One of the datasets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff. It contains data on the DMFT Index (Decayed, Missing, and Filled Teeth) before and after different prevention…
25235 runs0 likes11 downloads11 reach34 impact
797 instances - 5 features - 6 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
856 runs0 likes11 downloads11 reach7 impact
209 instances - 7 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
406 runs1 likes11 downloads12 reach8 impact
4229 instances - 1618 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
766 runs0 likes11 downloads11 reach7 impact
2000 instances - 217 features - 2 classes - 0 missing values