OpenML
Filter results by:
Original data from https://github.com/propublica/compas-analysis/ by ProPublica. The data was subsequently preprocessed and reduced to relevant features for classification. The target variable is…
0 runs0 likes0 downloads0 reach1 impact
5278 instances - 14 features - 2 classes - 0 missing values
Title: Communities and Crime Abstract: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and…
0 runs1 likes3 downloads4 reach5 impact
1994 instances - 128 features - 0 classes - 39202 missing values
Chocolate Bar Ratings. Expert ratings of over 1,700 chocolate bars. Each chocolate is evaluated from a combination of both objective qualities and subjective interpretation. A rating here only…
0 runs0 likes1 downloads1 reach1 impact
1795 instances - 9 features - 42 classes - 1 missing values
Chocolate Bar Ratings. Expert ratings of over 1,700 chocolate bars. Each chocolate is evaluated from a combination of both objective qualities and subjective interpretation. A rating here only…
0 runs0 likes1 downloads1 reach1 impact
1794 instances - 9 features - 41 classes - 0 missing values
At Santander our mission is to help people and businesses prosper. We are always looking for ways to help our customers understand their financial health and identify which products and services might…
0 runs0 likes1 downloads1 reach1 impact
200000 instances - 202 features - 2 classes - 0 missing values
Experiment data obtained by running random configurations of an SVM through mlr on 106 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
540576 instances - 15 features - classes - 658962 missing values
Experiment data obtained by running random configurations of the hnsw kNN through mlr on 116 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
111753 instances - 13 features - classes - 0 missing values
Experiment data obtained by running random configurations of rpart through mlr on 115 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
92067 instances - 12 features - classes - 0 missing values
Experiment data obtained by running random configurations of glmnet through mlr on 114 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
104820 instances - 10 features - classes - 0 missing values
r fgtgt
0 runs0 likes1 downloads1 reach0 impact
2 instances - 8 features - classes - 0 missing values
Experiment data obtained by running random configurations of xgboost through mlr on 118 different classification tasks from openml. Parameter descriptions:…
0 runs0 likes0 downloads0 reach0 impact
2955210 instances - 21 features - classes - 7051006 missing values
Experiment data obtained by running random configurations of ranger through mlr on 119 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
278863 instances - 16 features - classes - 138965 missing values
dataset for bme
0 runs0 likes0 downloads0 reach0 impact
63 instances - 12 features - classes - 52 missing values
Multi-label dataset for text-classification. It consists of article titles and partial blurbs. Blurbs can be assigned to several categories (e.g. Science, News, Games) based on word predictors.
0 runs0 likes2 downloads2 reach5 impact
3782 instances - 1101 features - classes - 0 missing values
dd fgrfg
0 runs0 likes0 downloads0 reach0 impact
2 instances - 3 features - classes - 0 missing values
The database covers all the international short track games in the last 5 years. Currently it contains only men's 500m. Detailed lap data including personal time and ranking in each game from seasons…
0 runs0 likes1 downloads1 reach4 impact
Identify jets of particles from the LHC, created for the study of ultra low latency inference with hls4ml. Use 16 high level features to identify the 5 jet classes: quark (q), gluon (g), W boson (w),…
0 runs0 likes0 downloads0 reach0 impact
830000 instances - 17 features - 5 classes - 0 missing values
efef ffrf
0 runs0 likes0 downloads0 reach0 impact
9 instances - 3 features - classes - 0 missing values
ssc vdv
0 runs0 likes0 downloads0 reach0 impact
1556 instances - 2 features - classes - 0 missing values
ssf
0 runs0 likes0 downloads0 reach1 impact
2 instances - 2 features - classes - 0 missing values
Data Set Information: This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From…
0 runs0 likes0 downloads0 reach0 impact
30000 instances - 24 features - 2 classes - 0 missing values
This data approach student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographic, social and school related features) and it was…
0 runs0 likes1 downloads1 reach2 impact
395 instances - 33 features - 0 classes - 0 missing values
e fvr
0 runs0 likes0 downloads0 reach0 impact
2 instances - 11 features - classes - 0 missing values
This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). *…
0 runs0 likes4 downloads4 reach5 impact
39644 instances - 61 features - 0 classes - 0 missing values
efe rgrg
0 runs0 likes0 downloads0 reach0 impact
sdsw frfr
0 runs0 likes0 downloads0 reach0 impact
1556 instances - 3 features - classes - 0 missing values
swd dced
0 runs0 likes0 downloads0 reach0 impact
589 instances - 3 features - classes - 0 missing values
frf r
0 runs0 likes0 downloads0 reach0 impact
2 instances - 3 features - classes - 0 missing values
e3r4vr t4r
0 runs0 likes0 downloads0 reach0 impact
2 instances - 5 features - classes - 0 missing values
e eded
0 runs0 likes0 downloads0 reach0 impact
2 instances - 4 features - classes - 0 missing values
Zurich public transport delay data 2016-10-30 03:30:00 CET - 2016-11-27 01:20:00 CET cleaned and prepared at Open Data Day 2017. For this version, the task was downsampled to 0.5 percent. Some…
0 runs0 likes0 downloads0 reach0 impact
27327 instances - 18 features - 0 classes - 657 missing values
This data represents crime reported to the Seattle Police Department (SPD). Each row contains the record of a unique event where at least one criminal offense was reported by a member of the community…
0 runs0 likes0 downloads0 reach0 impact
52358 instances - 8 features - 0 classes - 650 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure. For this…
0 runs0 likes0 downloads0 reach0 impact
26969 instances - 8 features - 2 classes - 0 missing values
b gtrg
0 runs0 likes0 downloads0 reach0 impact
4 instances - 7 features - classes - 0 missing values
sqs efrf
0 runs0 likes0 downloads0 reach0 impact
4 instances - 5 features - classes - 0 missing values
f fr
0 runs0 likes0 downloads0 reach0 impact
2 instances - 5 features - classes - 0 missing values
dd efrg
0 runs0 likes0 downloads0 reach0 impact
1556 instances - 5629 features - classes - 0 missing values
The midwest survey dataset contain individual responses from surveys about regional identification conducted for FiveThirtyEight by SurveyMonkey.
0 runs0 likes0 downloads0 reach0 impact
2778 instances - 28 features - 10 classes - 1744 missing values
The midwest survey dataset contain individual responses from surveys about regional identification conducted for FiveThirtyEight by SurveyMonkey.
0 runs0 likes0 downloads0 reach0 impact
2778 instances - 28 features - 10 classes - 1744 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes information on…
0 runs1 likes1 downloads2 reach2 impact
163065 instances - 12 features - 0 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
150 instances - 5 features - classes - 0 missing values
wind daily average wind speeds for 1961-1978 at 12 synoptic meteorological stations in the Republic of Ireland (Haslett and raftery 1989). These data were analyzed in detail in the following article:…
0 runs0 likes6 downloads6 reach7 impact
6574 instances - 15 features - 0 classes - 0 missing values
This is the Tecator data set: The task is to predict the fat content of a meat sample on the basis of its near infrared absorbance spectrum. 1. Statement of permission from Tecator (the original data…
0 runs0 likes4 downloads4 reach7 impact
240 instances - 125 features - 0 classes - 0 missing values
File README ----------- smoothmeth A collection of the data sets used in the book "Smoothing Methods in Statistics," by Jeffrey S. Simonoff, Springer-Verlag, New York, 1996. Submitted by Jeff Simonoff…
0 runs0 likes0 downloads0 reach7 impact
2178 instances - 4 features - 0 classes - 0 missing values
A family of datasets synthetically generated from a simulation of how bank-customers choose their banks. Tasks are based on predicting the fraction of bank customers who leave the bank because of full…
0 runs0 likes2 downloads2 reach7 impact
8192 instances - 33 features - 0 classes - 0 missing values
The data consist of 2001 observations taken from a balloon about 30 kilometres above the surface of the earth. In the section of the flight shown here the balloon increases in height. As radiation…
0 runs1 likes2 downloads3 reach7 impact
2001 instances - 3 features - 0 classes - 0 missing values
The data consist of annual observations on the level of strike volume (days lost due to industrial disputes per 1000 wage salary earners), and their covariates in 18 OECD countries from 1951-1985. The…
0 runs0 likes2 downloads2 reach7 impact
625 instances - 7 features - 0 classes - 0 missing values
S&P Letters Data We collected information on the variables using all the block groups in California from the 1990 Census. In this sample a block group on average includes 1425.5 individuals living in…
0 runs0 likes6 downloads6 reach7 impact
20640 instances - 9 features - 0 classes - 0 missing values
This is an artificial data set used in Friedman (1991) and also described in Breiman (1996,p.139). The cases are generated using the following method: Generate the values of 10 attributes, X1, ...,…
0 runs2 likes7 downloads9 reach7 impact
40768 instances - 11 features - 0 classes - 0 missing values
eevrr der
0 runs0 likes0 downloads0 reach0 impact
1557 instances - 5629 features - classes - 0 missing values
ede wey
0 runs0 likes0 downloads0 reach0 impact
589 instances - 2909 features - classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: We used binary encoding for each feature (o, b, x), so the number of features is 42*3 = 126
0 runs0 likes3 downloads3 reach10 impact
67557 instances - 127 features - 0 classes - 0 missing values
This is a test dataset
0 runs0 likes0 downloads0 reach0 impact
Touch samples 2
0 runs0 likes0 downloads0 reach0 impact
265 instances - 11 features - 8 classes - 0 missing values
valores de saida de fardamento com temperaturas, admissões e demissões
0 runs0 likes0 downloads0 reach0 impact
6277 instances - 7 features - 0 classes - 0 missing values
rrvrf 4rr
0 runs0 likes0 downloads0 reach0 impact
4 instances - 49 features - classes - 0 missing values
ef f
0 runs0 likes0 downloads0 reach0 impact
4 instances - 49 features - classes - 0 missing values
Touch Signals
0 runs0 likes0 downloads0 reach0 impact
265 instances - 11 features - classes - 0 missing values
punch sound
0 runs0 likes1 downloads1 reach2 impact
221 instances - 1 features - classes - 0 missing values
dsd efe
0 runs0 likes0 downloads0 reach0 impact
601 instances - 7 features - classes - 0 missing values
fr frf
0 runs0 likes0 downloads0 reach0 impact
1556 instances - 5629 features - classes - 0 missing values
sde c
0 runs0 likes0 downloads0 reach0 impact
1556 instances - 5629 features - classes - 0 missing values
de d
0 runs0 likes0 downloads0 reach0 impact
1556 instances - 5628 features - classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs1 likes1 downloads2 reach7 impact
4450 instances - 203 features - 0 classes - 0 missing values
This analysis describes and summarizes the relationships between 1987 salaries of major league baseball players and the player's performance. The salary data were taken from Sports Illustrated, April…
0 runs0 likes2 downloads2 reach7 impact
Multi-label dataset. The birds dataset consists of 327 audio recordings of 12 different vocalizing bird species. Each sound can be assigned to various bird species.
0 runs0 likes6 downloads6 reach5 impact
645 instances - 279 features - 2 classes - 0 missing values
1. Title: Wine Quality 2. Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 3. Past Usage: P. Cortez, A. Cerdeira, F.…
0 runs1 likes13 downloads14 reach7 impact
6497 instances - 12 features - 0 classes - 0 missing values
This is one of a family of datasets synthetically generated from a realistic simulation of the dynamics of a Unimation Puma 560 robot arm. There are eight datastets in this family . In this repository…
0 runs0 likes6 downloads6 reach7 impact
8192 instances - 33 features - 0 classes - 0 missing values
efe def
0 runs0 likes0 downloads0 reach0 impact
4 instances - 49 features - classes - 0 missing values
as cscs
0 runs0 likes0 downloads0 reach0 impact
1557 instances - 5629 features - classes - 0 missing values
This database was designed on the basis of data provided by US Census Bureau [http://www.census.gov] (under Lookup Access [http://www.census.gov/cdrom/lookup]: Summary Tape File 1). The data were…
0 runs1 likes6 downloads7 reach7 impact
22784 instances - 17 features - 0 classes - 0 missing values
In the early 2000s, Billy Beane and Paul DePodesta worked for the Oakland Athletics. While there, they literally changed the game of baseball. They didn't do it using a bat or glove, and they…
0 runs0 likes7 downloads7 reach6 impact
1232 instances - 15 features - 0 classes - 3600 missing values
1. Title: Faults in a urban waste water treatment plant 2. Source Information: -- Creators: Manel Poch (igte2@cc.uab.es) Unitat d'Enginyeria Quimica Universitat Autonoma de Barcelona. Bellaterra.…
0 runs0 likes1 downloads1 reach7 impact
sd vfv
0 runs0 likes0 downloads0 reach0 impact
4 instances - 50 features - 2 classes - 0 missing values
Wikidata with top-474 most frequent types and ingoing/outgoing properties as features
0 runs0 likes15 downloads15 reach5 impact
19254100 instances - 2331 features - classes - 0 missing values
This dataset consists of beer reviews from Beeradvocate. The data span a period of more than 10 years, including all ~1.5 million reviews up to November 2011. Each review includes ratings in terms of…
0 runs0 likes2 downloads2 reach2 impact
1586614 instances - 13 features - 104 classes - 68148 missing values
mydata
0 runs0 likes0 downloads0 reach0 impact
3892 instances - 36 features - classes - 0 missing values
Dataset showing Data from matches played RB Leipzig prior to 14.06.2020
0 runs0 likes0 downloads0 reach0 impact
102 instances - 1 features - classes - 0 missing values
Since the first automobile, the Benz Patent Motor Car in 1886, Mercedes-Benz has stood for important automotive innovations. These include, for example, the passenger safety cell with crumple zone,…
0 runs0 likes0 downloads0 reach0 impact
4209 instances - 377 features - 0 classes - 0 missing values
as dwd
0 runs0 likes0 downloads0 reach0 impact
1557 instances - 5629 features - classes - 0 missing values
ef r
0 runs0 likes0 downloads0 reach0 impact
1557 instances - 5629 features - classes - 0 missing values
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's…
0 runs0 likes0 downloads0 reach0 impact
1460 instances - 80 features - 0 classes - 6965 missing values
BitcoinHeist Ransomware Dataset Akcora, C.G., Li, Y., Gel, Y.R. and Kantarcioglu, M., 2019. BitcoinHeist. Topological Data Analysis for Ransomware Detection on the Bitcoin Blockchain. IJCAI-PRICAI…
0 runs1 likes0 downloads1 reach0 impact
2916697 instances - 10 features - 29 classes - 0 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 12 features - 0 classes - 0 missing values
r rg
0 runs0 likes0 downloads0 reach0 impact
4 instances - 50 features - classes - 0 missing values
dd ref
0 runs0 likes0 downloads0 reach0 impact
4 instances - 50 features - classes - 0 missing values
Public procurement data for the European Economic Area, Switzerland, and the Macedonia. 2015
0 runs0 likes1 downloads1 reach2 impact
565163 instances - 75 features - 0 classes - 15247061 missing values
When you've been devastated by a serious car accident, your focus is on the things that matter the most: family, friends, and other loved ones. Pushing paper with your insurance agent is the last…
0 runs0 likes0 downloads0 reach0 impact
188318 instances - 131 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach10 impact
64700 instances - 301 features - 0 classes - 0 missing values
hydraulic
0 runs0 likes0 downloads0 reach0 impact
2205 instances - 22 features - classes - 0 missing values
Primary Biliary Cirrhosis This data set is a follow-up to the original PBC data set, as discussed in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991. An…
0 runs0 likes5 downloads5 reach7 impact
1945 instances - 19 features - 0 classes - 1133 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs0 likes1 downloads1 reach3 impact
593 instances - 78 features - classes - 0 missing values
Water stress dataset for Indian variety of wheat crop: The data consist of a file system-based data of Raj 3765 variety of wheat. There are twenty-four chlorophyll fluorescence images captured every…
0 runs0 likes1 downloads1 reach1 impact
1188 instances - 23 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach10 impact
49749 instances - 301 features - 0 classes - 0 missing values
![palmerpenguins](https://github.com/allisonhorst/palmerpenguins/raw/master/man/figures/logo.png) ## Description The goal of palmerpenguins is to provide a great dataset for data exploration &…
0 runs0 likes0 downloads0 reach0 impact
344 instances - 7 features - 3 classes - 18 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes14 downloads16 reach10 impact
101766 instances - 50 features - 3 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes2 downloads2 reach3 impact
14 instances - 5 features - 2 classes - 0 missing values
student performance 1
0 runs0 likes1 downloads1 reach0 impact
3892 instances - 36 features - classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach10 impact
49749 instances - 301 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes1 downloads1 reach10 impact
32561 instances - 124 features - 0 classes - 0 missing values