Data
Filter results by:
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach15 impact
468 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
67 runs0 likes2 downloads2 reach15 impact
458 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach15 impact
470 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes4 downloads4 reach14 impact
138 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach15 impact
185 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2855 runs0 likes8 downloads8 reach24 impact
542 instances - 10936 features - 2 classes - 0 missing values
Mean While 1
0 runs0 likes3 downloads3 reach11 impact
253 instances - 38 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes2 downloads2 reach22 impact
1600 instances - 1001 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
analcatdata_fraud-pmlb
34 runs0 likes0 downloads0 reach21 impact
42 instances - 12 features - 2 classes - 0 missing values
Dataset used by Buntine and Niblett (1992). Composed of 10 features, one of which is irrelevant. The target is a disjunctive normal form formula over the nine other attributes, with additional…
31 runs0 likes0 downloads0 reach22 impact
973 instances - 10 features - 2 classes - 0 missing values
Derived from the Musk dataset: https://www.openml.org/d/1116
31 runs0 likes1 downloads1 reach22 impact
476 instances - 169 features - 2 classes - 0 missing values
Derived from the Musk dataset: https://www.openml.org/d/1116
31 runs0 likes1 downloads1 reach22 impact
6598 instances - 169 features - 2 classes - 0 missing values
corral-pmlb
31 runs0 likes1 downloads1 reach22 impact
160 instances - 7 features - 2 classes - 0 missing values
PMLB version of the Titanic dataset, which only uses 3 features. See version 1 for the complete version: https://www.openml.org/d/40945
35 runs0 likes1 downloads1 reach23 impact
2201 instances - 4 features - 2 classes - 0 missing values
)), [PMLB](https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/tokyo1) This is Performance co-pilot (PCP) data for the Tokyo server at Silicon Graphics International…
37 runs0 likes1 downloads1 reach22 impact
959 instances - 45 features - 2 classes - 0 missing values
parity5_plus_5-pmlb
31 runs0 likes0 downloads0 reach22 impact
1124 instances - 11 features - 2 classes - 0 missing values
cleve-pmlb
32 runs0 likes1 downloads1 reach21 impact
303 instances - 14 features - 2 classes - 0 missing values
The origin is not clear, but presumably this is an artificial problem representing M-of-N rules. The target is 1 if a certain M 'bits' are '1'? (Joaquin Vanschoren)
31 runs0 likes0 downloads0 reach22 impact
1324 instances - 11 features - 2 classes - 0 missing values
dis-pmlb
31 runs0 likes1 downloads1 reach22 impact
3772 instances - 30 features - 2 classes - 0 missing values
parity5-pmlb
32 runs0 likes0 downloads0 reach21 impact
32 instances - 6 features - 2 classes - 0 missing values
mux6-pmlb
31 runs0 likes1 downloads1 reach21 impact
128 instances - 7 features - 2 classes - 0 missing values
postoperative-patient-data-pmlb
26 runs0 likes1 downloads1 reach21 impact
88 instances - 9 features - 2 classes - 0 missing values
threeOf9-pmlb
31 runs0 likes0 downloads0 reach22 impact
512 instances - 10 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
12 runs0 likes0 downloads0 reach14 impact
2351 instances - 47 features - 2 classes - 0 missing values
GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusable digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection…
466 runs0 likes53 downloads53 reach26 impact
7000 instances - 5001 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
10 runs0 likes0 downloads0 reach14 impact
3660 instances - 47 features - 2 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
10 runs0 likes0 downloads0 reach14 impact
2352 instances - 47 features - 2 classes - 0 missing values
Binarized version of the semeion dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach10 impact
319 instances - 257 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach9 impact
156 instances - 91 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach9 impact
156 instances - 81 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
Binarized version of the USPS dataset (see version 2). Only instances with class labels 6 and 9 from the original dataset are considered and encoded as 0 (original class 6) and 1 (original class 9).
0 runs0 likes0 downloads0 reach9 impact
1424 instances - 257 features - 2 classes - 0 missing values
Binarized version of the isolet dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach12 impact
600 instances - 618 features - 2 classes - 0 missing values
Binarized version of the cnae-9 dataset (see version 1). Only instances with class labels 1 and 2 from the original dataset are considered.
0 runs0 likes0 downloads0 reach10 impact
240 instances - 857 features - 2 classes - 0 missing values
The ILPD liver dataset from the OpenCC18 with the gender binary encoded so all features are numeric
1 runs0 likes0 downloads0 reach10 impact
583 instances - 11 features - 2 classes - 0 missing values
Sick dataset from the opencc18 with all textual binary variables label encoded.
1 runs0 likes2 downloads2 reach9 impact
3772 instances - 30 features - 2 classes - 0 missing values
sd vfv
0 runs0 likes0 downloads0 reach7 impact
4 instances - 50 features - 2 classes - 0 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure. For this…
0 runs0 likes2 downloads2 reach6 impact
26969 instances - 8 features - 2 classes - 0 missing values
Data Set Information: This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From…
0 runs0 likes1 downloads1 reach7 impact
30000 instances - 24 features - 2 classes - 0 missing values
This dataset is gather to detect whether a person is running or walking based on deep neural networks and sensor data collected from iOS devices. The dataset represents 88588 sensor data samples…
1 runs0 likes4 downloads4 reach14 impact
88588 instances - 7 features - 2 classes - 0 missing values
"The speech dataset was also provided by (see citation request) and contains real world data from recorded English language. The normal class contains data from persons having an American accent…
1599 runs0 likes6 downloads6 reach17 impact
3686 instances - 401 features - 2 classes - 0 missing values
The satellite dataset comprises of features extracted from satellite observations. In particular, each image was taken under four different light wavelength, two in visible light (green and red) and…
2074 runs3 likes70 downloads73 reach33 impact
5100 instances - 37 features - 2 classes - 0 missing values
Klaverjas is an example of the Jack-Nine card games, which are characterized as trick-taking games where the the Jack and nine of the trump suit are the highest-ranking trumps, and the tens and aces…
0 runs0 likes1 downloads1 reach10 impact
981541 instances - 33 features - 2 classes - 0 missing values
xxx
0 runs0 likes0 downloads0 reach6 impact
891 instances - 8 features - 2 classes - 689 missing values
Automated file upload of BNG(spambase)
98 runs0 likes3 downloads3 reach12 impact
1000000 instances - 58 features - 2 classes - 0 missing values
Automated file upload of 20_newsgroups.drift
124 runs0 likes2 downloads2 reach16 impact
399940 instances - 1001 features - 2 classes - 0 missing values
Automated file upload of BNG(ionosphere)
99 runs1 likes4 downloads5 reach13 impact
1000000 instances - 35 features - 2 classes - 0 missing values
Multi-label dataset. The genbase dataset contains protein sequences that can be assigned to several classes of protein families.
0 runs0 likes2 downloads2 reach11 impact
662 instances - 1213 features - 2 classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs1 likes11 downloads12 reach11 impact
2000 instances - 140 features - 2 classes - 0 missing values
The langLog dataset includes 1004 textual predictors and was originally compiled in the doctorial thesis of Read (2010). It consists of 956 text samples that can be assigned to one or more topics such…
0 runs0 likes5 downloads5 reach11 impact
1460 instances - 1079 features - 2 classes - 0 missing values
Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.
0 runs0 likes8 downloads8 reach12 impact
2000 instances - 250 features - 2 classes - 0 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes13 downloads13 reach11 impact
2407 instances - 300 features - 2 classes - 0 missing values
Multi-label dataset for text-classification. It consists of article titles and partial blurbs. Blurbs can be assigned to several categories (e.g. Science, News, Games) based on word predictors.
0 runs1 likes16 downloads17 reach16 impact
3782 instances - 1101 features - 2 classes - 0 missing values
Multi-label dataset. The yeast dataset (Elisseeff and Weston, 2002) consists of micro-array expression data, as well as phylogenetic profiles of yeast, and includes 2417 genes and 103 predictors. In…
0 runs0 likes3 downloads3 reach11 impact
2417 instances - 117 features - 2 classes - 0 missing values
Multi-label dataset. The birds dataset consists of 327 audio recordings of 12 different vocalizing bird species. Each sound can be assigned to various bird species.
0 runs0 likes6 downloads6 reach11 impact
645 instances - 279 features - 2 classes - 0 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs2 likes5 downloads7 reach11 impact
593 instances - 78 features - 2 classes - 0 missing values
Multi-label dataset. The UC Berkeley enron4 dataset represents a subset of the original enron5 dataset and consists of 1684 cases of emails with 21 labels and 1001 predictor variables.
1 runs0 likes5 downloads5 reach14 impact
1702 instances - 1054 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes0 downloads0 reach17 impact
3140 instances - 260 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes1 downloads1 reach17 impact
4147 instances - 49 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach16 impact
100 instances - 10001 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach18 impact
3153 instances - 971 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes0 downloads0 reach15 impact
31406 instances - 23 features - 2 classes - 29756 missing values
#modelage
31 runs0 likes0 downloads0 reach8 impact
202 instances - 20 features - 2 classes - 17 missing values
Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets] Content Each row represents a…
0 runs1 likes2 downloads3 reach8 impact
7043 instances - 20 features - 2 classes - 0 missing values
This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.
0 runs0 likes0 downloads0 reach10 impact
39948 instances - 12 features - 2 classes - 0 missing values
This data was collected from combine primary and secondary sources, through questionnaire, verbal interview and some part of the hospital’s record department’s data, from the selected…
0 runs0 likes0 downloads0 reach9 impact
281 instances - 98 features - 2 classes - 2 missing values
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes1 downloads1 reach9 impact
82318 instances - 478 features - 2 classes - 2399311 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs0 likes1 downloads1 reach7 impact
538638 instances - 7 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes2 downloads2 reach9 impact
14 instances - 5 features - 2 classes - 0 missing values
Data set shows information about participants of math conference. isPresent is target column for classification task.
0 runs0 likes0 downloads0 reach9 impact
246 instances - 7 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes1 downloads1 reach9 impact
14 instances - 5 features - 2 classes - 0 missing values
Test dataset
3 runs0 likes0 downloads0 reach16 impact
15547 instances - 61 features - 2 classes - 280 missing values
This dataset contains 358 lyrics of songs for the rock bands 'The Rolling Stones' and 'Deep Purple'. The bands are equally represented in the dataset (179 songs for each band). This dataset was…
8 runs0 likes2 downloads2 reach21 impact
358 instances - 2 features - 2 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
2 runs0 likes0 downloads0 reach12 impact
595212 instances - 38 features - 2 classes - 846458 missing values
nominal features and target for COMPAS
0 runs0 likes1 downloads1 reach9 impact
5278 instances - 14 features - 2 classes - 0 missing values
Original data from https://github.com/propublica/compas-analysis/ by ProPublica. The data was subsequently preprocessed and reduced to relevant features for classification. The target variable is…
0 runs0 likes1 downloads1 reach10 impact
5278 instances - 14 features - 2 classes - 0 missing values
Asteroid Dataset
0 runs0 likes1 downloads1 reach10 impact
126131 instances - 34 features - 2 classes - 99 missing values
Asteroid Dataset
0 runs0 likes1 downloads1 reach11 impact
126131 instances - 34 features - 2 classes - 99 missing values
Data
0 runs0 likes1 downloads1 reach10 impact
539383 instances - 8 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes1 downloads1 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes1 downloads1 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach8 impact
14 instances - 5 features - 2 classes - 0 missing values
At Santander our mission is to help people and businesses prosper. We are always looking for ways to help our customers understand their financial health and identify which products and services might…
0 runs0 likes1 downloads1 reach8 impact
200000 instances - 202 features - 2 classes - 0 missing values
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The…
0 runs1 likes2 downloads3 reach9 impact
284807 instances - 31 features - 2 classes - 0 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs1 likes0 downloads1 reach0 impact
2215023 instances - 9 features - 2 classes - 0 missing values
This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.
0 runs0 likes0 downloads0 reach1 impact
39948 instances - 12 features - 2 classes - 0 missing values
Payments given by healthcare manufacturing companies to medical doctors or hospitals
0 runs0 likes0 downloads0 reach0 impact
73558 instances - 6 features - 2 classes - 83182 missing values
# Data Description This is the historical price data of the FOREX EUR/CAD from Dukascopy. One instance (row) is one candlestick of one minute. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes0 downloads0 reach9 impact
375840 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/GBP from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes1 downloads1 reach9 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX AUD/NZD from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes1 downloads1 reach9 impact
43825 instances - 12 features - 2 classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/NZD from Dukascopy. One instance (row) is one candlestick of one day. The whole dataset has the data range from 1-1-2018 to…
0 runs0 likes1 downloads1 reach9 impact
1832 instances - 12 features - 2 classes - 0 missing values