OpenML
Filter results by:
Source: James P Bridge, Sean B Holden and Lawrence C Paulson University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD UK +44 (0)1223 763500…
26323 runs1 likes21 downloads22 reach43 impact
6118 instances - 52 features - 6 classes - 0 missing values
Creators: Renata Cristina Barros Madeo (Madeo, R. C. B.) Priscilla Koch Wagner (Wagner, P. K.) Sarajane Marques Peres (Peres, S. M.) {renata.si, priscilla.wagner, sarajane} at usp.br…
26327 runs1 likes16 downloads17 reach38 impact
9873 instances - 33 features - 5 classes - 0 missing values
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios. Collected by David Deterding (data and…
26450 runs1 likes18 downloads19 reach43 impact
990 instances - 13 features - 11 classes - 0 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
27229 runs0 likes11 downloads11 reach10 impact
736 instances - 20 features - 5 classes - 448 missing values
One of the datasets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff. It contains data on the DMFT Index (Decayed, Missing, and Filled Teeth) before and after different prevention…
27710 runs0 likes13 downloads13 reach42 impact
797 instances - 5 features - 6 classes - 0 missing values
This data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four-minute "first date" with every other participant of the…
28060 runs19 likes162 downloads181 reach34 impact
8378 instances - 123 features - 2 classes - 18372 missing values
Current dataset was adapted to ARFF format from the UCI version. Sample code ID's were removed. ! Note that there is also a related Breast Cancer Wisconsin (Diagnosis) Data Set with a different set of…
28321 runs1 likes20 downloads21 reach9 impact
699 instances - 10 features - 2 classes - 16 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
28846 runs0 likes8 downloads8 reach34 impact
841 instances - 71 features - 4 classes - 0 missing values
This data set was generated to model psychological experimental results. Each example is classified as having the balance scale tip to the right, tip to the left, or be balanced. The attributes are…
29597 runs2 likes17 downloads19 reach12 impact
625 instances - 5 features - 3 classes - 0 missing values
The database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood. The aim is to…
29713 runs2 likes24 downloads26 reach11 impact
6430 instances - 37 features - 6 classes - 0 missing values
NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many…
31491 runs2 likes32 downloads34 reach11 impact
846 instances - 19 features - 4 classes - 0 missing values
Tattile Via Gaetano Donizetti, 1-3-5,25030 Mairano (Brescia), Italy. ### Dataset Description Semeion Handwritten Digit Data Set, where 1593 handwritten digits from around 80 persons were scanned and…
32940 runs0 likes23 downloads23 reach58 impact
1593 instances - 257 features - 10 classes - 0 missing values
### Description This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories. ### Source ``` Patrick Marques…
34145 runs0 likes17 downloads17 reach56 impact
1080 instances - 857 features - 9 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
34558 runs0 likes23 downloads23 reach11 impact
2000 instances - 48 features - 10 classes - 0 missing values
1. Title of Database: Abalone data 2. Sources: (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of Primary Industry and Fisheries, Tasmania…
34899 runs0 likes18 downloads18 reach9 impact
4177 instances - 9 features - 28 classes - 0 missing values
### Description The data consists of real historical data collected from 2010 & 2011. Employees are manually allowed or denied access to resources over time. The data is used to create an algorithm…
35323 runs0 likes22 downloads22 reach28 impact
32769 instances - 10 features - 2 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
35343 runs0 likes18 downloads18 reach11 impact
2000 instances - 7 features - 10 classes - 0 missing values
1. Title of Database: Optical Recognition of Handwritten Digits 2. Source: E. Alpaydin, C. Kaynak Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr…
35798 runs3 likes22 downloads25 reach11 impact
5620 instances - 65 features - 10 classes - 0 missing values
We create a digit database by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by…
37194 runs0 likes21 downloads21 reach11 impact
10992 instances - 17 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
37368 runs0 likes18 downloads18 reach14 impact
2000 instances - 217 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
38026 runs0 likes12 downloads12 reach11 impact
2000 instances - 77 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
38639 runs0 likes20 downloads20 reach11 impact
2000 instances - 65 features - 10 classes - 0 missing values
### Description MicroMass (pure spectra version) is a dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. ### Source ``` Pierre Mahé,…
39629 runs1 likes16 downloads17 reach98 impact
571 instances - 1301 features - 20 classes - 0 missing values
This is the large soybean database from the UCI repository, with its training and test database combined into a single file. There are 19 classes, only the first 15 of which have been used in prior…
40719 runs1 likes53 downloads54 reach12 impact
683 instances - 36 features - 19 classes - 2337 missing values
Predict a biological response of molecules from their chemical properties. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological…
48338 runs2 likes37 downloads39 reach34 impact
3751 instances - 1777 features - 2 classes - 0 missing values
### Description ISOLET (Isolated Letter Speech Recognition) dataset was generated as follows: 150 subjects spoke the name of each letter of the alphabet twice. Hence, there are 52 training examples…
50748 runs0 likes70 downloads70 reach131 impact
7797 instances - 618 features - 26 classes - 0 missing values
Source: Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi…
51512 runs1 likes25 downloads26 reach27 impact
11055 instances - 31 features - 2 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 0.1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
63420 runs0 likes17 downloads17 reach25 impact
39948 instances - 10 features - 2 classes - 0 missing values
Donated by P. Savicky, Institute of Computer Science, AS of CR, Czech Republic Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope.…
64659 runs1 likes29 downloads30 reach25 impact
19020 instances - 12 features - 2 classes - 0 missing values
Dataset creator and donator: Zhi Liu, e-mail: liuzhi8673 '@' gmail.com, institution: National Engineering Research Center for E-Learning, Hubei Wuhan, China Data Set Information: dataset are derived…
65168 runs2 likes45 downloads47 reach216 impact
1500 instances - 10001 features - 50 classes - 0 missing values
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was…
65398 runs3 likes36 downloads39 reach31 impact
45211 instances - 17 features - 2 classes - 0 missing values
1. Data set title: Nomao Data Set 2. Abstract: Nomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place.…
67393 runs0 likes16 downloads16 reach28 impact
34465 instances - 119 features - 2 classes - 0 missing values
Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of 5 different datasets (SYLVA, GINA, NOVA, HIVA, ADA). The purpose of the challenge…
69081 runs0 likes20 downloads20 reach26 impact
3468 instances - 971 features - 2 classes - 0 missing values
1. TITLE: Letter Image Recognition Data The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The…
69254 runs1 likes72 downloads73 reach11 impact
20000 instances - 17 features - 26 classes - 0 missing values
Dataset from the MLRR repository: http://axon.cs.byu.edu:5000/ More infos: https://archive.ics.uci.edu/ml/datasets/Musk+(Version+2)
82516 runs1 likes19 downloads20 reach32 impact
6598 instances - 168 features - 2 classes - 0 missing values
### Description Scene recognition dataset - It contains characteristics about images and their classes. The original dataset is a multi-label classification problem with 6 different labels: {Beach,…
89930 runs0 likes23 downloads23 reach25 impact
2407 instances - 300 features - 2 classes - 0 missing values
#### Abstract: MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty…
100887 runs0 likes19 downloads19 reach26 impact
2600 instances - 501 features - 2 classes - 0 missing values
The dataset (originally named ELEC2) contains 45,312 instances dated from 7 May 1996 to 5 December 1998. Each example of the dataset refers to a period of 30 minutes, i.e. there are 48 instances for…
106854 runs3 likes39 downloads42 reach11 impact
45312 instances - 9 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
108666 runs1 likes14 downloads15 reach34 impact
554 instances - 7 features - 2 classes - 0 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable,…
109963 runs1 likes20 downloads21 reach27 impact
15545 instances - 6 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
115699 runs0 likes16 downloads16 reach27 impact
1458 instances - 38 features - 2 classes - 0 missing values
Author: Volker Lohweg (University of Applied Sciences, Ostwestfalen-Lippe) Source: [UCI](https://archive.ics.uci.edu/ml/datasets/banknote+authentication) - 2012 Please cite:…
137675 runs5 likes33 downloads38 reach30 impact
1372 instances - 5 features - 2 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Margin). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143050 runs1 likes17 downloads18 reach418 impact
1600 instances - 65 features - 100 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Texture). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143077 runs2 likes65 downloads67 reach418 impact
1599 instances - 65 features - 100 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Shape). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143288 runs1 likes39 downloads40 reach416 impact
1600 instances - 65 features - 100 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
146025 runs1 likes18 downloads19 reach26 impact
1563 instances - 38 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
149998 runs0 likes26 downloads26 reach27 impact
1109 instances - 22 features - 2 classes - 0 missing values
This data set contains 416 liver patient records and 167 non liver patient records.The data set was collected from north east of Andhra Pradesh, India. The class label divides the patients into 2…
154859 runs2 likes23 downloads25 reach26 impact
583 instances - 11 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
161516 runs2 likes28 downloads30 reach29 impact
2109 instances - 22 features - 2 classes - 0 missing values
SPAM E-mail Database The "spam" concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography... Our collection of spam e-mails came from our postmaster…
161528 runs5 likes89 downloads94 reach11 impact
4601 instances - 58 features - 2 classes - 0 missing values
Lucas, D. D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., and Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models, Geosci. Model Dev. Discuss.,…
162436 runs0 likes22 downloads22 reach25 impact
540 instances - 21 features - 2 classes - 0 missing values
All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement…
165839 runs3 likes94 downloads97 reach29 impact
14980 instances - 15 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for science data processing. Data comes from McCabe and Halstead features extractors of source code. These features were…
176259 runs0 likes25 downloads25 reach26 impact
522 instances - 22 features - 2 classes - 0 missing values
Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y coordinate, the points will create either a Hill (a “bump” in the terrain) or a…
183264 runs0 likes23 downloads23 reach25 impact
1212 instances - 101 features - 2 classes - 0 missing values
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008. 1 . Abstract: Two ground ozone level data sets are included in…
187955 runs1 likes18 downloads19 reach28 impact
2534 instances - 73 features - 2 classes - 0 missing values
1. Title: Pima Indians Diabetes Database 2. Sources: (a) Original owners: National Institute of Diabetes and Digestive and Kidney Diseases (b) Donor of database: Vincent Sigillito…
202119 runs6 likes89 downloads95 reach15 impact
768 instances - 9 features - 2 classes - 0 missing values
The aim of this dataset is to distinguish between nasal (class 0) and oral sounds (class 1). Five different attributes were chosen to characterize each vowel: they are the amplitudes of the five first…
218302 runs5 likes36 downloads41 reach29 impact
5404 instances - 6 features - 2 classes - 0 missing values
Current dataset was adapted to ARFF format from the UCI version. Sample code ID's were removed. ! Note that there is also a related Breast Cancer Wisconsin (Original) Data Set with a different set of…
226562 runs4 likes37 downloads41 reach27 impact
569 instances - 31 features - 2 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
267507 runs1 likes22 downloads23 reach27 impact
1055 instances - 42 features - 2 classes - 0 missing values
Author: Alen Shapiro Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Chess+(King-Rook+vs.+King-Pawn)) Please cite: [UCI citation policy](https://archive.ics.uci.edu/ml/citation_policy.html) 1.…
273622 runs1 likes42 downloads43 reach15 impact
3196 instances - 37 features - 2 classes - 0 missing values
A dataset of steel plates' faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition. The dataset consists of 27 features describing each…
277313 runs1 likes46 downloads47 reach25 impact
1941 instances - 34 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
358450 runs2 likes19 downloads21 reach39 impact
556 instances - 7 features - 2 classes - 0 missing values
This database encodes the complete set of possible board configurations at the end of tic-tac-toe games, where "x" is assumed to have played first. The target concept is "win for x" (i.e., true when…
386330 runs2 likes81 downloads83 reach10 impact
958 instances - 10 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
394293 runs2 likes27 downloads29 reach37 impact
601 instances - 7 features - 2 classes - 0 missing values
Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem. To demonstrate the RFMTC marketing model (a modified version of RFM), this study…
467769 runs5 likes89 downloads94 reach42 impact
748 instances - 5 features - 2 classes - 0 missing values
This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ``` It is worse…
505936 runs19 likes249 downloads268 reach28 impact
1000 instances - 21 features - 2 classes - 0 missing values