* Dataset: Wilt Data Set * Abstract: High-resolution Remote Sensing data set (Quickbird). Small number of training samples of diseased trees, large number for other land cover. Testing data set from…
317411 runs0 likes22 downloads22 reach13 impact
4839 instances - 6 features - 2 classes - 0 missing values
This is the famous Australian dataset, retrieved 2014-11-14 from the libSVM site. It was normalized. The original version is from…
188003 runs0 likes8 downloads8 reach3 impact
690 instances - 15 features - 2 classes - 0 missing values
* Title: Breast Cancer Wisconsin (Diagnostic) Data Set (WDBC) * Abstract: Diagnostic Wisconsin Breast Cancer Database * Source: Creators: 1. Dr. William H. Wolberg, General Surgery Dept. University of…
186379 runs0 likes23 downloads23 reach12 impact
569 instances - 31 features - 2 classes - 0 missing values
(www.semeion.it) * Title: Steel Plates Faults Data Set * Abstract: A dataset of steel plates' faults, classified into 7 different types. The goal was to train machine learning for automatic pattern…
177047 runs0 likes18 downloads18 reach13 impact
1941 instances - 34 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
149755 runs0 likes14 downloads14 reach13 impact
522 instances - 22 features - 2 classes - 0 missing values
Description of the German credit dataset. 1. Title: German Credit data 2. Source Information Professor Dr. Hans Hofmann Institut f"ur Statistik und "Okonometrie Universit"at Hamburg FB…
144247 runs0 likes27 downloads27 reach11 impact
1000 instances - 21 features - 2 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
143037 runs0 likes10 downloads10 reach13 impact
1055 instances - 42 features - 2 classes - 0 missing values
1. Source: Lee Graham (lee '@' stellaralchemy.com) Franz Oppacher (oppacher '@' scs.carleton.ca) Carleton University, Department of Computer Science Intelligent Systems Research Unit 1125 Colonel By…
138147 runs0 likes13 downloads13 reach12 impact
1212 instances - 101 features - 2 classes - 0 missing values
1. Title: Pima Indians Diabetes Database 2. Sources: (a) Original owners: National Institute of Diabetes and Digestive and Kidney Diseases (b) Donor of database: Vincent Sigillito…
137473 runs3 likes54 downloads57 reach15 impact
768 instances - 9 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
133210 runs0 likes17 downloads17 reach15 impact
2109 instances - 22 features - 2 classes - 0 missing values
1. One-hundred plant species leaves data set (class = margin). 2. Sources: (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The colour images…
132233 runs1 likes10 downloads11 reach405 impact
1600 instances - 65 features - 100 classes - 0 missing values
1. One-hundred plant species leaves data set (class = shape). 2. Sources: (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The colour images are…
132231 runs1 likes30 downloads31 reach405 impact
1600 instances - 65 features - 100 classes - 0 missing values
The data directory contains the binary images (masks) of the leaf samples. The colour images are not included. There are three features: Shape, Margin and Texture. As discussed in the paper(s) above.…
132184 runs2 likes53 downloads55 reach406 impact
1599 instances - 65 features - 100 classes - 0 missing values
All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement…
131559 runs2 likes75 downloads77 reach15 impact
14980 instances - 15 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
122837 runs0 likes22 downloads22 reach15 impact
1109 instances - 22 features - 2 classes - 0 missing values
1 . Abstract: Two ground ozone level data sets are included in this collection. One is the eight hour peak set (eighthr.data), the other is the one hour peak set (onehr.data). Those data were…
121847 runs0 likes10 downloads10 reach13 impact
2534 instances - 73 features - 2 classes - 0 missing values
* Title: Phoneme dataset * Abstract: The aim of this dataset is to distinguish between nasal (class 0) and oral sounds (class 1). The class distribution is 3,818 samples in class 0 and 1,586 samples…
121593 runs1 likes17 downloads18 reach13 impact
5404 instances - 6 features - 2 classes - 0 missing values
Source: D. Lucas (ddlucas .at. alum.mit.edu), Lawrence Livermore National Laboratory; R. Klein (rklein .at. astron.berkeley.edu), Lawrence Livermore National Laboratory & U.C. Berkeley; J. Tannahill…
120415 runs0 likes13 downloads13 reach12 impact
540 instances - 21 features - 2 classes - 0 missing values
Source: 1. Bendi Venkata Ramana, ramana.bendi '@' gmail.com Associate Professor, Department of Information Technology, Aditya Instutute of Technology and Management, Tekkali - 532201, Andhra Pradesh,…
117289 runs0 likes12 downloads12 reach12 impact
583 instances - 11 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable,…
117133 runs0 likes13 downloads13 reach14 impact
1563 instances - 38 features - 2 classes - 0 missing values
1. Title: SPAM E-mail Database 2. Sources: (a) Creators: Mark Hopkins, Erik Reeber, George Forman, Jaap Suermondt Hewlett-Packard Labs, 1501 Page Mill Rd., Palo Alto, CA 94304 (b) Donor: George Forman…
104854 runs3 likes63 downloads66 reach10 impact
4601 instances - 58 features - 2 classes - 0 missing values
Source: Owner of database: Volker Lohweg (University of Applied Sciences, Ostwestfalen-Lippe, volker.lohweg '@' hs-owl.de) Donor of database: Helene Doerksen (University of Applied Sciences,…
98052 runs1 likes12 downloads13 reach12 impact
1372 instances - 5 features - 2 classes - 0 missing values
Dataset from the MLRR repository: http://axon.cs.byu.edu:5000/
81998 runs1 likes14 downloads15 reach16 impact
6598 instances - 170 features - 2 classes - 0 missing values
Dataset creator and donator: Zhi Liu, e-mail: liuzhi8673 '@' gmail.com, institution: National Engineering Research Center for E-Learning, Hubei Wuhan, China Data Set Information: dataset are derived…
65163 runs1 likes34 downloads35 reach204 impact
1500 instances - 10001 features - 50 classes - 0 missing values
Title: Blood Transfusion Service Center Data Set Abstract: Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem.…
64778 runs1 likes20 downloads21 reach12 impact
748 instances - 5 features - 2 classes - 0 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable,…
55349 runs0 likes14 downloads14 reach15 impact
15545 instances - 6 features - 2 classes - 0 missing values
No data.
53271 runs0 likes18 downloads18 reach9 impact
45312 instances - 9 features - 2 classes - 0 missing values
This dataset represents a set of possible advertisements on Internet pages. The features encode the geometry of the image (if available) as well as phrases occurring in the URL, the image's URL and…
45468 runs2 likes24 downloads26 reach16 impact
3279 instances - 1559 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable,…
44602 runs0 likes12 downloads12 reach14 impact
1458 instances - 38 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
44566 runs0 likes12 downloads12 reach14 impact
4562 instances - 49 features - 2 classes - 0 missing values
Scene recognition dataset Source: Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757-1771, 2004. 1:…
42189 runs0 likes17 downloads17 reach13 impact
2407 instances - 300 features - 2 classes - 0 missing values
Donated by P. Savicky, Institute of Computer Science, AS of CR, Czech Republic The data are MC generated (see below) to simulate registration of high energy gamma particles in a ground-based…
41105 runs0 likes20 downloads20 reach14 impact
19020 instances - 12 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
39059 runs0 likes14 downloads14 reach14 impact
14395 instances - 217 features - 2 classes - 0 missing values
1. TITLE: Letter Image Recognition Data The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The…
37762 runs1 likes59 downloads60 reach10 impact
20000 instances - 17 features - 26 classes - 0 missing values
This data set was generated as follows. 150 subjects spoke the name of each letter of the alphabet twice. Hence, we have 52 training examples from each speaker. The speakers are grouped into sets of…
36196 runs0 likes61 downloads61 reach113 impact
7797 instances - 618 features - 26 classes - 0 missing values
1. Title of Database: Abalone data 2. Sources: (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of Primary Industry and Fisheries, Tasmania…
33135 runs0 likes14 downloads14 reach8 impact
4177 instances - 9 features - 29 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 0.1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
30150 runs0 likes10 downloads10 reach14 impact
39948 instances - 12 features - 2 classes - 0 missing values
Available at: [pdf] http://hdl.handle.net/1822/14838 [bib] http://www3.dsi.uminho.pt/pcortez/bib/2011-esm-1.txt 1. Title: Bank Marketing 2. Sources Created by: Paulo Cortez (Univ. Minho) and Sérgio…
27820 runs0 likes16 downloads16 reach13 impact
45211 instances - 17 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
27304 runs0 likes16 downloads16 reach14 impact
3468 instances - 971 features - 2 classes - 0 missing values
* Title: Tamilnadu Electricity Board Hourly Readings Data Set * Abstract: This data can be effectively produced the result to fewer parameter of the Load profile can be reduced in the Database *…
26816 runs0 likes19 downloads19 reach85 impact
45781 instances - 4 features - 20 classes - 0 missing values
* Dataset Title: MicroMass - Pure (pure spectra version) * Abstract: A dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. * Source:…
26712 runs1 likes10 downloads11 reach84 impact
571 instances - 1301 features - 20 classes - 0 missing values
This is the large soybean database from the UCI repository, with its training and test database combined into a single file. There are 19 classes, only the first 15 of which have been used in prior…
26376 runs0 likes48 downloads48 reach9 impact
683 instances - 36 features - 19 classes - 2337 missing values
The following are data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices…
20711 runs0 likes9 downloads9 reach65 impact
500 instances - 24 features - 15 classes - 0 missing values
Abstract: MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is…
19865 runs0 likes13 downloads13 reach12 impact
2600 instances - 501 features - 2 classes - 0 missing values
The multi-feature digit dataset ------------------------------- Oowned and donated by: ---------------------- Robert P.W. Duin Department of Applied Physics Delft University of Technology P.O. Box…
18057 runs0 likes13 downloads13 reach11 impact
2000 instances - 7 features - 10 classes - 0 missing values
The multi-feature digit dataset ------------------------------- Oowned and donated by: ---------------------- Robert P.W. Duin Department of Applied Physics Delft University of Technology P.O. Box…
17868 runs0 likes19 downloads19 reach12 impact
2000 instances - 48 features - 10 classes - 0 missing values
The multi-feature digit dataset ------------------------------- Oowned and donated by: ---------------------- Robert P.W. Duin Department of Applied Physics Delft University of Technology P.O. Box…
17819 runs0 likes8 downloads8 reach11 impact
2000 instances - 77 features - 10 classes - 0 missing values
The multi-feature digit dataset ------------------------------- Oowned and donated by: ---------------------- Robert P.W. Duin Department of Applied Physics Delft University of Technology P.O. Box…
17787 runs0 likes16 downloads16 reach11 impact
2000 instances - 65 features - 10 classes - 0 missing values
1. Title of Database: Optical Recognition of Handwritten Digits 2. Source: E. Alpaydin, C. Kaynak Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr…
17604 runs1 likes17 downloads18 reach10 impact
5620 instances - 65 features - 10 classes - 0 missing values
The multi-feature digit dataset ------------------------------- Oowned and donated by: ---------------------- Robert P.W. Duin Department of Applied Physics Delft University of Technology P.O. Box…
17570 runs0 likes15 downloads15 reach11 impact
2000 instances - 217 features - 10 classes - 0 missing values
1. Title of Database: Pen-Based Recognition of Handwritten Digits 2. Source: E. Alpaydin, F. Alimoglu Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr…
17145 runs0 likes15 downloads15 reach10 impact
10992 instances - 17 features - 10 classes - 0 missing values
Data from the Kaggle Bioresponse challenge: https://www.kaggle.com/c/bioresponse The objective of the competition is to help us build as good a model as possible so that we can, as optimally as this…
16325 runs0 likes26 downloads26 reach15 impact
3751 instances - 1777 features - 2 classes - 0 missing values
The multi-feature digit dataset ------------------------------- Oowned and donated by: ---------------------- Robert P.W. Duin Department of Applied Physics Delft University of Technology P.O. Box…
15553 runs0 likes13 downloads13 reach9 impact
2000 instances - 241 features - 10 classes - 0 missing values
* Source: Marques de Sá, J.P., jpmdesa '@' gmail.com, Biomedical Engineering Institute, Porto, Portugal. Bernardes, J., joaobern '@' med.up.pt, Faculty of Medicine, University of Porto, Portugal.…
14123 runs0 likes20 downloads20 reach45 impact
2126 instances - 36 features - 10 classes - 0 missing values
Dataset artificially generated by using first order theory which describes structure of ten capital letters of English alphabet
14020 runs0 likes7 downloads7 reach45 impact
10218 instances - 8 features - 10 classes - 0 missing values
1. Title: Image Segmentation data 2. Source Information -- Creators: Vision Group, University of Massachusetts -- Donor: Vision Group (Carla Brodley, brodley@cs.umass.edu) -- Date: November, 1990 3.…
13988 runs0 likes21 downloads21 reach11 impact
2310 instances - 20 features - 7 classes - 0 missing values
Tattile Via Gaetano Donizetti, 1-3-5,25030 Mairano (Brescia), Italy. * Title: Semeion Handwritten Digit Data Set * Abstract: 1593 handwritten digits from around 80 persons were scanned, stretched in a…
13893 runs0 likes17 downloads17 reach45 impact
1593 instances - 257 features - 10 classes - 0 missing values
Source: Patrick Marques Ciarelli, pciarelli '@' lcad.inf.ufes.br, Department of Electrical Engineering, Federal University of Espirito Santo Elias Oliveira, elias '@' lcad.inf.ufes.br, Department of…
13628 runs0 likes12 downloads12 reach40 impact
1080 instances - 857 features - 9 classes - 0 missing values
1. Title of Database: Annealing Data 2. Source Information: donated by David Sterling and Wray Buntine. 3. Past Usage: unknown 4. Relevant Information: -- Explanation: I suspect this was left by Ross…
13346 runs0 likes14 downloads14 reach12 impact
898 instances - 39 features - 6 classes - 22175 missing values
Relevant Papers: Laurent Candillier and Vincent Lemaire. Design and Analysis of the Nomao Challenge - Active Learning in the Real-World. In: Proceedings of the ALRA : Active Learning in Real-world…
13051 runs0 likes9 downloads9 reach13 impact
34465 instances - 119 features - 2 classes - 0 missing values
This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers. The data was collected for examining our newly developed classifier for multidimensional curves…
13030 runs0 likes8 downloads8 reach42 impact
9961 instances - 15 features - 9 classes - 0 missing values
1. Title: Tic-Tac-Toe Endgame database 2. Source Information -- Creator: David W. Aha (aha@cs.jhu.edu) -- Donor: David W. Aha (aha@cs.jhu.edu) -- Date: 19 August 1991 3. Known Past Usage: 1.…
12288 runs0 likes20 downloads20 reach9 impact
958 instances - 10 features - 2 classes - 0 missing values
NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many…
11954 runs1 likes21 downloads22 reach10 impact
846 instances - 19 features - 4 classes - 0 missing values
The database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood. The aim is to…
11857 runs1 likes21 downloads22 reach8 impact
6430 instances - 37 features - 6 classes - 0 missing values
The Monk's Problems: Problem 1 This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks.…
11693 runs0 likes10 downloads10 reach16 impact
556 instances - 7 features - 2 classes - 0 missing values
Abstract: Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative…
11127 runs1 likes15 downloads16 reach37 impact
1080 instances - 82 features - 8 classes - 1396 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au6-250-drift-au6-cd1-500 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and…
10949 runs0 likes9 downloads9 reach36 impact
750 instances - 41 features - 8 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au6-1000 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of…
10948 runs0 likes15 downloads15 reach37 impact
1000 instances - 41 features - 8 classes - 0 missing values
Source: James P Bridge, Sean B Holden and Lawrence C Paulson University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD UK +44 (0)1223 763500…
9671 runs0 likes19 downloads19 reach29 impact
6118 instances - 52 features - 6 classes - 0 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
9549 runs0 likes8 downloads8 reach6 impact
736 instances - 20 features - 5 classes - 448 missing values
eating
9409 runs0 likes14 downloads14 reach34 impact
945 instances - 6374 features - 7 classes - 0 missing values
The Monk's Problems: Problem 3 This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks.…
9337 runs0 likes11 downloads11 reach15 impact
554 instances - 7 features - 2 classes - 0 missing values
In my work on context-sensitive learning, I used the "Deterding Vowel Recognition Data", but I found it necessary to reformulate the data. Implicit in the original data is contextual information on…
9059 runs0 likes10 downloads10 reach31 impact
990 instances - 13 features - 11 classes - 0 missing values
1. Title: Contraceptive Method Choice 2. Sources: (a) Origin: This dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey (b) Creator: Tjen-Sien Lim (limt@stat.wisc.edu)…
8988 runs0 likes14 downloads14 reach9 impact
1473 instances - 10 features - 3 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
8928 runs0 likes8 downloads8 reach29 impact
797 instances - 5 features - 6 classes - 0 missing values
This data consists of synthetically generated control charts. This dataset contains 600 examples of control charts synthetically generated by the process in Alcock and Manolopoulos (1999). There are…
8869 runs0 likes8 downloads8 reach31 impact
600 instances - 62 features - 6 classes - 0 missing values
Data on tree growth used in the Case Study published in the September, 1995 issue of the Canadian Journal of Statistics. This data set was been provided by Dr. Fernando Camacho, Ontario Hydro…
8538 runs1 likes13 downloads14 reach27 impact
2796 instances - 35 features - 6 classes - 68100 missing values
Title: Gas Sensor Array Drift Dataset Data Set Source: Creators: Alexander Vergara (vergara '@' ucsd.edu) BioCircutis Institute University of California San Diego San Diego, California, USA Donors of…
8514 runs0 likes14 downloads14 reach29 impact
13910 instances - 129 features - 6 classes - 0 missing values
1. Title: Waveform Database Generator (written in C) 2. Source: (a) Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group:…
8497 runs1 likes52 downloads53 reach10 impact
5000 instances - 41 features - 3 classes - 0 missing values
Title: Human Activity Recognition Using Smartphones Abstract: Human Activity Recognition database built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a…
8488 runs0 likes21 downloads21 reach29 impact
10299 instances - 562 features - 6 classes - 0 missing values
1. Title: Car Evaluation Database 2. Sources: (a) Creator: Marko Bohanec (b) Donors: Marko Bohanec (marko.bohanec@ijs.si) Blaz Zupan (blaz.zupan@ijs.si) (c) Date: June, 1997 3. Past Usage: The…
8268 runs1 likes19 downloads20 reach9 impact
1728 instances - 7 features - 4 classes - 0 missing values
1. Title: Balance Scale Weight & Distance Database 2. Source Information: (a) Source: Generated to model psychological experiments reported by Siegler, R. S. (1976). Three Aspects of Cognitive…
7740 runs0 likes12 downloads12 reach11 impact
625 instances - 5 features - 3 classes - 0 missing values
The Monk's Problems: Problem 2 This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks.…
7627 runs0 likes10 downloads10 reach15 impact
601 instances - 7 features - 2 classes - 0 missing values
1. Title: Chess End-Game -- King+Rook versus King+Pawn on a7 (usually abbreviated KRKPA7). The pawn on a7 means it is one square away from queening. It is the King+Rook's side (white) to move. 2.…
7599 runs0 likes19 downloads19 reach10 impact
3196 instances - 37 features - 2 classes - 0 missing values
Additionally, the authors require a citation to one or more publications from those cited as relevant papers. Source: Creators: Renata Cristina Barros Madeo (Madeo, R. C. B.) Priscilla Koch Wagner…
7270 runs1 likes10 downloads11 reach24 impact
9873 instances - 33 features - 5 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au7-cpd1-500 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity…
7083 runs0 likes7 downloads7 reach24 impact
500 instances - 13 features - 5 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au7-300-drift-au7-cpd1-800 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and…
7067 runs0 likes10 downloads10 reach25 impact
1100 instances - 13 features - 5 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
6828 runs0 likes6 downloads6 reach21 impact
841 instances - 71 features - 4 classes - 0 missing values
1. Title: Credit Approval 2. Sources: (confidential) Submitted by quinlan@cs.su.oz.au 3. Past Usage: See Quinlan, * "Simplifying decision trees", Int J Man-Machine Studies 27, Dec 1987, pp. 221-234. *…
6745 runs0 likes20 downloads20 reach12 impact
690 instances - 16 features - 2 classes - 67 missing values
No data.
6612 runs0 likes11 downloads11 reach10 impact
699 instances - 10 features - 2 classes - 16 missing values
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for…
6141 runs5 likes66 downloads71 reach21 impact
150 instances - 5 features - 3 classes - 0 missing values
* Dataset Title: Wall-Following Robot Navigation Data Data Set * Abstract: The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4…
6051 runs0 likes18 downloads18 reach21 impact
5456 instances - 25 features - 4 classes - 0 missing values
; ; Thyroid disease records supplied by the Garavan Institute and J. Ross ; Quinlan, New South Wales Institute, Syndney, Australia. ; ; 1987. ; sick, negative. | classes age: continuous. sex: M, F. on…
5575 runs0 likes19 downloads19 reach6 impact
3772 instances - 30 features - 2 classes - 6064 missing values
1. Title: Mushroom Database 2. Sources: (a) Mushroom records drawn from The Audubon Society Field Guide to North American Mushrooms (1981). G. H. Lincoff (Pres.), New York: Alfred A. Knopf (b) Donor:…
5487 runs0 likes25 downloads25 reach8 impact
8124 instances - 23 features - 2 classes - 2480 missing values
This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering. If you publish material based…
5157 runs0 likes15 downloads15 reach14 impact
10885 instances - 22 features - 2 classes - 25 missing values
Donor: G. Towell, M. Noordewier, and J. Shavlik Primate splice-junction gene sequences (DNA) with associated imperfect domain theory. All examples taken from Genbank 64.1. Categories "ei" and "ie"…
4634 runs0 likes8 downloads8 reach8 impact
3190 instances - 62 features - 3 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au7-700 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of…
4475 runs0 likes6 downloads6 reach16 impact
700 instances - 13 features - 3 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element…
4401 runs0 likes9 downloads9 reach14 impact
50000 instances - 231 features - 2 classes - 8024152 missing values
PRO FOOTBALL SCORES (raw data appears after the description below) How well do the oddsmakers of Las Vegas predict the outcome of professional football games? Is there really a home field advantage -…
4342 runs0 likes16 downloads16 reach13 impact
672 instances - 10 features - 2 classes - 1200 missing values
Irish Educational Transitions Data Below are shown data on educational transitions for a sample of 500 Irish schoolchildren aged 11 in 1967. The data were collected by Greaney and Kelleghan (1984),…
4291 runs0 likes11 downloads11 reach13 impact
500 instances - 6 features - 2 classes - 32 missing values