OpenML
Filter results by:
Data set shows information about participants of math conference. isPresent is target column for classification task.
0 runs0 likes0 downloads0 reach2 impact
246 instances - 7 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach2 impact
14 instances - 5 features - 2 classes - 0 missing values
Test dataset
3 runs0 likes0 downloads0 reach7 impact
15547 instances - 61 features - 2 classes - 280 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
0 runs0 likes0 downloads0 reach1 impact
595212 instances - 38 features - 2 classes - 846458 missing values
#modelage
31 runs0 likes0 downloads0 reach1 impact
202 instances - 20 features - 2 classes - 17 missing values
Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets] Content Each row represents a…
0 runs0 likes0 downloads0 reach1 impact
7043 instances - 20 features - 2 classes - 0 missing values
nominal features and target for COMPAS
0 runs0 likes0 downloads0 reach2 impact
5278 instances - 14 features - 2 classes - 0 missing values
Original data from https://github.com/propublica/compas-analysis/ by ProPublica. The data was subsequently preprocessed and reduced to relevant features for classification. The target variable is…
0 runs0 likes0 downloads0 reach1 impact
5278 instances - 14 features - 2 classes - 0 missing values
Binarized version of the USPS dataset (see version 2). Only instances with class labels 6 and 9 from the original dataset are considered and encoded as 0 (original class 6) and 1 (original class 9).
0 runs0 likes0 downloads0 reach2 impact
1424 instances - 257 features - 2 classes - 0 missing values
Binarized version of the isolet dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach3 impact
600 instances - 618 features - 2 classes - 0 missing values
Binarized version of the cnae-9 dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach2 impact
240 instances - 857 features - 2 classes - 0 missing values
Binarized version of the semeion dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach2 impact
319 instances - 257 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach1 impact
156 instances - 81 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach1 impact
156 instances - 91 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach1 impact
156 instances - 81 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX USD/DKK from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/CAD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX USD/CHF from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/SGD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/CHF from Dukascopy. One instance (row) is one candlestick of one day. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes1 downloads1 reach1 impact
1833 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/HUF from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes1 downloads1 reach1 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/SEK from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX USD/DKK from Dukascopy. One instance (row) is one candlestick of one day. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach1 impact
1832 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX AUD/NZD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes2 downloads2 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
The ILPD liver dataset from the OpenCC18 with the gender binary encoded so all features are numeric
1 runs0 likes0 downloads0 reach2 impact
583 instances - 11 features - 2 classes - 0 missing values
Sick dataset from the opencc18 with all textual binary variables label encoded.
1 runs0 likes0 downloads0 reach2 impact
3772 instances - 30 features - 2 classes - 0 missing values
Elegibilidade ecommerce
0 runs0 likes1 downloads1 reach1 impact
269177 instances - 2 features - 2 classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs1 likes10 downloads11 reach4 impact
2000 instances - 140 features - 2 classes - 0 missing values
Lucas, D. D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., and Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models, Geosci. Model Dev. Discuss.,…
162436 runs0 likes21 downloads21 reach18 impact
540 instances - 21 features - 2 classes - 0 missing values
Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem. To demonstrate the RFMTC marketing model (a modified version of RFM), this study…
464739 runs5 likes67 downloads72 reach29 impact
748 instances - 5 features - 2 classes - 0 missing values
A dataset of steel plates' faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition. The dataset consists of 27 features describing each…
277313 runs1 likes38 downloads39 reach18 impact
1941 instances - 34 features - 2 classes - 0 missing values
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008. 1 . Abstract: Two ground ozone level data sets are included in…
184395 runs0 likes15 downloads15 reach20 impact
2534 instances - 73 features - 2 classes - 0 missing values
Each record represents 100 points on a two-dimensional graph. When plotted in order (from 1 through 100) as the Y coordinate, the points will create either a Hill (a “bump” in the terrain) or a…
183264 runs0 likes21 downloads21 reach18 impact
1212 instances - 101 features - 2 classes - 0 missing values
This is the poker dataset, retrieved 2013-11-14 from the libSVM site. Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows: -join test and…
23 runs0 likes18 downloads18 reach8 impact
1025010 instances - 11 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes3 downloads3 reach8 impact
203 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach7 impact
138 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
76 runs0 likes5 downloads5 reach8 impact
187 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach8 impact
185 instances - 10937 features - 2 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. All datasets contain between 100 and 400 samples, characterized by values of 20,000 - 65,000 attributes. Samples are assigned to several (2-10) classes.…
48 runs0 likes6 downloads6 reach8 impact
159 instances - 61360 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
78 runs0 likes4 downloads4 reach8 impact
421 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach8 impact
410 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach8 impact
470 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes3 downloads3 reach8 impact
412 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes3 downloads3 reach8 impact
201 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes5 downloads5 reach8 impact
250 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
79 runs0 likes3 downloads3 reach8 impact
322 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes5 downloads5 reach8 impact
275 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
72 runs1 likes7 downloads8 reach9 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2862 runs0 likes8 downloads8 reach17 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach8 impact
468 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes4 downloads4 reach8 impact
484 instances - 10937 features - 2 classes - 0 missing values
DOROTHEA is a drug discovery dataset. Chemical compounds represented by structural molecular features must be classified as active (binding to thrombin) or inactive. This is one of 5 datasets of the…
0 runs0 likes7 downloads7 reach13 impact
1150 instances - 100001 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach8 impact
100 instances - 10001 features - 2 classes - 0 missing values
ARCENE's task is to distinguish cancer versus normal patterns from mass-spectrometric data. This is a two-class classification problem with continuous input variables. This dataset is one of 5…
17 runs0 likes10 downloads10 reach7 impact
200 instances - 10001 features - 2 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
265463 runs1 likes17 downloads18 reach19 impact
1055 instances - 42 features - 2 classes - 0 missing values
Even smaller sample of version 1
0 runs0 likes3 downloads3 reach5 impact
149639 instances - 12 features - 2 classes - 0 missing values
Citation Request: This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
66 runs0 likes4 downloads4 reach7 impact
277 instances - 10 features - 2 classes - 0 missing values
* Dataset: DBworld e-mails data set Task: dbworld-subjects * Source: Michele Filannino, PhD University of Manchester Centre for Doctoral Training Email: filannim_AT_cs.man.ac.uk * Data Set…
40 runs0 likes2 downloads2 reach6 impact
64 instances - 243 features - 2 classes - 0 missing values
### Description __Changes to version 1:__ all categorical features transformed as such. This dataset represents a set of possible advertisements on Internet pages. ### Sources (a) Creator and donor:…
1430 runs0 likes3 downloads3 reach14 impact
3279 instances - 1559 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes1 downloads1 reach9 impact
4147 instances - 49 features - 2 classes - 0 missing values
No data.
353 runs0 likes17 downloads17 reach2 impact
120919 instances - 1002 features - 2 classes - 0 missing values
This data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four-minute "first date" with every other participant of the…
28060 runs19 likes156 downloads175 reach25 impact
8378 instances - 123 features - 2 classes - 18372 missing values
* Dataset: Hill valley dataset. A noiseless version of the data set.
117 runs0 likes8 downloads8 reach8 impact
1212 instances - 101 features - 2 classes - 0 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
6956 runs2 likes6 downloads8 reach17 impact
5000 instances - 21 features - 2 classes - 0 missing values
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807…
355 runs0 likes54 downloads54 reach12 impact
284807 instances - 31 features - 2 classes - 0 missing values
#### Abstract: MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty…
98292 runs0 likes17 downloads17 reach19 impact
2600 instances - 501 features - 2 classes - 0 missing values
pie chart 2
101 runs0 likes5 downloads5 reach6 impact
745 instances - 37 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
13749 runs1 likes21 downloads22 reach28 impact
48842 instances - 15 features - 2 classes - 6465 missing values
* Abstract: Predict the Bankruptcy from Qualitative parameters from experts. * Source: Source Information -- Creator : Mr.A.Martin(jayamartin '@' yahoo.com) Mr.J.Uthayakumar (uthayakumar17691 '@'…
147 runs0 likes11 downloads11 reach7 impact
250 instances - 7 features - 2 classes - 0 missing values
### Description The data consists of real historical data collected from 2010 & 2011. Employees are manually allowed or denied access to resources over time. The data is used to create an algorithm…
35323 runs0 likes16 downloads16 reach19 impact
32769 instances - 10 features - 2 classes - 0 missing values
Predict a biological response of molecules from their chemical properties. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological…
46540 runs2 likes37 downloads39 reach24 impact
3751 instances - 1777 features - 2 classes - 0 missing values
Source: Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi…
50791 runs1 likes20 downloads21 reach19 impact
11055 instances - 31 features - 2 classes - 0 missing values
The aim of this dataset is to distinguish between nasal (class 0) and oral sounds (class 1). Five different attributes were chosen to characterize each vowel: they are the amplitudes of the five first…
215901 runs5 likes33 downloads38 reach21 impact
5404 instances - 6 features - 2 classes - 0 missing values
Author: Volker Lohweg (University of Applied Sciences, Ostwestfalen-Lippe) Source: [UCI](https://archive.ics.uci.edu/ml/datasets/banknote+authentication) - 2012 Please cite:…
135489 runs3 likes25 downloads28 reach22 impact
1372 instances - 5 features - 2 classes - 0 missing values
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was…
64886 runs2 likes31 downloads33 reach21 impact
45211 instances - 17 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/USD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes1 downloads1 reach1 impact
43825 instances - 12 features - 2 classes - 0 missing values
Asteroid Dataset
0 runs0 likes0 downloads0 reach0 impact
126131 instances - 34 features - 2 classes - 99 missing values
Asteroid Dataset
0 runs0 likes0 downloads0 reach0 impact
126131 instances - 34 features - 2 classes - 99 missing values
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…
3 runs0 likes3 downloads3 reach5 impact
72983 instances - 33 features - 2 classes - 149271 missing values
Data
0 runs0 likes1 downloads1 reach3 impact
539383 instances - 8 features - 2 classes - 0 missing values
The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of…
0 runs0 likes11 downloads11 reach4 impact
1309 instances - 14 features - 2 classes - 3855 missing values
####1. Summary This dataset contain attributes of dresses and their recommendations according to their sales. Sales are monitor on the basis of alternate days. The attributes present analyzed are:…
18404 runs1 likes5 downloads6 reach11 impact
500 instances - 13 features - 2 classes - 835 missing values
flare-pmlb
32 runs0 likes1 downloads1 reach14 impact
1066 instances - 11 features - 2 classes - 0 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
0 runs0 likes8 downloads8 reach6 impact
226 instances - 24 features - 2 classes - 0 missing values
cast metal 1
111 runs0 likes9 downloads9 reach6 impact
327 instances - 38 features - 2 classes - 0 missing values
* Title: South Africa Heart Disease Dataset * Description A retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa. There are roughly two controls per case…
155 runs0 likes11 downloads11 reach7 impact
462 instances - 10 features - 2 classes - 0 missing values
* Title: Skin Segmentation Data Set * Abstract: The Skin Segmentation dataset is constructed over B, G, R color space. Skin and Nonskin dataset is generated using skin textures from face images of…
15 runs1 likes10 downloads11 reach7 impact
245057 instances - 4 features - 2 classes - 0 missing values
Mega watt
183 runs0 likes8 downloads8 reach8 impact
253 instances - 38 features - 2 classes - 0 missing values
Multi-label dataset for text-classification. It consists of article titles and partial blurbs. Blurbs can be assigned to several categories (e.g. Science, News, Games) based on word predictors.
0 runs1 likes12 downloads13 reach6 impact
3782 instances - 1101 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2866 runs0 likes8 downloads8 reach17 impact
546 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2860 runs0 likes7 downloads7 reach17 impact
604 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
66 runs0 likes5 downloads5 reach8 impact
259 instances - 10937 features - 2 classes - 0 missing values
* Abstract: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. This is a…
423 runs0 likes14 downloads14 reach6 impact
120 instances - 7 features - 2 classes - 0 missing values
All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement…
165222 runs3 likes92 downloads95 reach21 impact
14980 instances - 15 features - 2 classes - 0 missing values
This is a corrected version of the previous data file in version 1, which contained a dataset (349 instances) incorrectly merged from the original training and test sets available on UCI (there are…
0 runs0 likes3 downloads3 reach5 impact
267 instances - 45 features - 2 classes - 0 missing values
Automated file upload of 20_newsgroups.drift
124 runs0 likes2 downloads2 reach8 impact
399940 instances - 1001 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2841 runs0 likes4 downloads4 reach17 impact
630 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes3 downloads3 reach8 impact
413 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes3 downloads3 reach8 impact
347 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes4 downloads4 reach8 impact
355 instances - 10937 features - 2 classes - 0 missing values