OpenML
Filter results by:
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach13 impact
60 instances - 11 features - 0 classes - 14 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes2 downloads2 reach11 impact
100 instances - 10 features - classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes1 downloads1 reach11 impact
228 instances - 8 features - classes - 20 missing values
Contains 110 data sets from the book 'The Statistical Sleuth' by Fred Ramsey and Dan Schafer; Duxbury Press, 1997. (schafer@stat.orst.edu) [14/Oct/97] (172k) Note: description taken from this web…
0 runs0 likes0 downloads0 reach13 impact
47 instances - 8 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes0 downloads0 reach13 impact
51 instances - 7 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes1 downloads1 reach13 impact
120 instances - 3 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach13 impact
400 instances - 7 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach13 impact
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach13 impact
400 instances - 8 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach13 impact
400 instances - 8 features - 0 classes - 0 missing values
## Guess which points belong to signal track [COMET](http://comet.kek.jp/Introduction.html) is an experiment being constructed at the J-PARC proton beam laboratory in Japan. It will search for…
0 runs0 likes1 downloads1 reach11 impact
7619400 instances - 6 features - 0 classes - 0 missing values
We aggregated screen movements into screen-fixations using a Salvucci & Goldberg (2000) dispersion-threshold algorithm, and defined Perception Action Cycles (PACs) as fixations with at least one…
0 runs0 likes0 downloads0 reach0 impact
3395 instances - 20 features - classes - 168 missing values
arbres-urbains
0 runs0 likes0 downloads0 reach0 impact
699 instances - 57 features - 5 classes - 7889 missing values
arbres-urbains
0 runs0 likes0 downloads0 reach0 impact
699 instances - 57 features - 5 classes - 7889 missing values
The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors. The information about the…
0 runs0 likes0 downloads0 reach0 impact
1552 instances - 37 features - 0 classes - 19402 missing values
In the dataset there are 5 types of dataset.QCM3, QCM6, QCM7, QCM10, QCM12In each of dataset, there is alcohol classification of five types,1-octanol, 1-propanol, 2-butanol, 2-propanol, 1-isobutanolIn…
0 runs0 likes1 downloads1 reach0 impact
125 instances - 15 features - classes - 0 missing values
bases-de-donnees-annuelles-des-accidents-corporels-de-la-circulation-routiere-annees-de-2005-a-2019
0 runs0 likes0 downloads0 reach0 impact
132977 instances - 55 features - 0 classes - 550521 missing values
Sanitized and anonymized Cargo 2000 (C2K) airfreight tracking and tracing events, covering five months of business execution (3,942 process instances, 7,932 transport legs, 56,082 activities). ###…
0 runs0 likes0 downloads0 reach0 impact
3943 instances - 98 features - classes - 210284 missing values
A Vicon motion capture camera system was used to record 12 users performing 5 hand postures with markers attached to a left-handed glove. A rigid pattern of markers on the back of the glove was used…
0 runs0 likes0 downloads0 reach0 impact
78096 instances - 38 features - classes - 974700 missing values
The dataset was collected at 'Hospital Universitario de Caracas' in Caracas, Venezuela. The dataset comprises demographic information, habits, and historic medical records of 858 patients. Several…
0 runs0 likes0 downloads0 reach0 impact
858 instances - 36 features - classes - 3622 missing values
The dataset contains 19 attributes regarding ca cervix behavior risk with class label is ca_cervix with 1 and 0 as values which means the respondent with and without ca cervix, respectively. ###…
0 runs0 likes0 downloads0 reach0 impact
858 instances - 36 features - classes - 3622 missing values
This data set measures the running time of a matrix-matrix product A x B = C, where all matrices have size 2048 x 2048, using a parameterizable SGEMM GPU kernel with 241600 possible parameter…
0 runs0 likes0 downloads0 reach0 impact
241600 instances - 18 features - classes - 0 missing values
This dataset can be used to predict the chronic kidney disease and it can be collected from the hospital nearly 2 months of period. ### Attribute information We use 24 + class = 25 ( 11 numeric ,14…
0 runs0 likes0 downloads0 reach0 impact
400 instances - 26 features - classes - 1009 missing values
The energy dispersive X-ray fluorescence (EDXRF) was used to determine the chemical composition of celadon body and glaze in Longquan kiln (at Dayao County) and Jingdezhen kiln. Forty typical shards…
0 runs0 likes0 downloads0 reach0 impact
88 instances - 19 features - classes - 0 missing values
This dataset include data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. The data contains 17…
0 runs0 likes0 downloads0 reach0 impact
2111 instances - 17 features - classes - 0 missing values
The data was collected from car parks in Birmingham that are operated by NCP from Birmingham City Council. It contains the occupancy rates (8:00 to 16:30) from 2016/10/04 to 2016/12/19. ### Attribute…
0 runs0 likes0 downloads0 reach0 impact
35717 instances - 4 features - classes - 0 missing values
In our research each record (row) is data for a week. Each record also has the percentage of return that stock has in the following week (percent_change_next_weeks_price). Ideally, you want to…
0 runs0 likes0 downloads0 reach0 impact
750 instances - 16 features - classes - 60 missing values
This data set was collected from the internet traffic records on a university's firewall. There are 12 features in total. Action feature is used as a class. There are 4 classes in total. These are…
0 runs0 likes0 downloads0 reach0 impact
65532 instances - 12 features - classes - 0 missing values
Author: Francesca Grisoni, Claudia S. Neuhaus, Miyabi Hishinuma, Gisela Gabernet, Jan A. Hiss, - Masaaki Kotera, Gisbert Schneider Source:…
0 runs0 likes1 downloads1 reach0 impact
949 instances - 3 features - classes - 0 missing values
This is a part of collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.…
0 runs0 likes0 downloads0 reach0 impact
125 instances - 42 features - classes - 362 missing values
This is a part of collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.…
0 runs0 likes0 downloads0 reach0 impact
114 instances - 42 features - classes - 562 missing values
This is a part of collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.…
0 runs0 likes0 downloads0 reach0 impact
122 instances - 42 features - classes - 906 missing values
This is a part of collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.…
0 runs0 likes0 downloads0 reach0 impact
76 instances - 42 features - classes - 574 missing values
This is a part of collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.…
0 runs0 likes0 downloads0 reach0 impact
127 instances - 42 features - classes - 722 missing values
This is a part of collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.…
0 runs0 likes0 downloads0 reach0 impact
126 instances - 42 features - classes - 978 missing values
The Garment Industry is one of the key examples of the industrial globalization of this modern era. It is a highly labour-intensive industry with lots of manual processes. Satisfying the huge global…
0 runs0 likes0 downloads0 reach0 impact
1197 instances - 15 features - classes - 506 missing values
The classification task of this database is to determine where patients in a postoperative recovery area should be sent to next. Because hypothermia is a significant concern after surgery (Woolery, L.…
0 runs0 likes0 downloads0 reach0 impact
65532 instances - 12 features - classes - 0 missing values
This dataset attributes first names to genders, giving counts and probabilities. It combines open-source government data from the US, UK, Canada, and Australia. This dataset combines raw counts for…
0 runs0 likes0 downloads0 reach0 impact
147269 instances - 4 features - classes - 0 missing values
This is a part of collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.…
0 runs0 likes0 downloads0 reach0 impact
126 instances - 42 features - classes - 446 missing values
This is a part of collection of 8 files containing the match statistics for both women and men at the four major tennis tournaments of the year 2013. Each file has 42 columns and a minimum of 76 rows.…
0 runs0 likes0 downloads0 reach0 impact
127 instances - 42 features - classes - 788 missing values
This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It contains 19 house features plus the price and the id columns,…
0 runs0 likes4 downloads4 reach9 impact
21613 instances - 20 features - 0 classes - 0 missing values
![palmerpenguins](https://github.com/allisonhorst/palmerpenguins/raw/master/man/figures/logo.png) ## Description The goal of palmerpenguins is to provide a great dataset for data exploration &…
0 runs0 likes1 downloads1 reach8 impact
344 instances - 7 features - 3 classes - 18 missing values
Date converted to year/mo/day numerics.This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It contains 19 house…
0 runs0 likes3 downloads3 reach1 impact
21613 instances - 22 features - 0 classes - 0 missing values
Product listing data submitted to the U.S. FDA for all unfinished, unapproved drugs.
0 runs0 likes1 downloads1 reach0 impact
120215 instances - 20 features - 7 classes - 443305 missing values
Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331-339, 1995. #Dataset from the LIBSVM data repository. Preprocessing: First…
0 runs1 likes6 downloads7 reach16 impact
19928 instances - 62062 features - 0 classes - 0 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective accident.…
0 runs0 likes1 downloads1 reach0 impact
363243 instances - 67 features - 3 classes - 2181757 missing values
artificial no anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 2 features - classes - 0 missing values
leak detection file
0 runs0 likes0 downloads0 reach0 impact
23 instances - 4 features - classes - 0 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - 0 classes - 0 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - classes - 0 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 2 features - classes - 0 missing values
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The…
0 runs1 likes8 downloads9 reach8 impact
284807 instances - 31 features - 0 classes - 0 missing values
artificial no anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 2 features - 0 classes - 0 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - 2 classes - 0 missing values
This version has feature names based on https://www2.1010data.com/documentationcenter/beta/Tutorials/MachineLearningExamples/CensusIncomeDataSet.html Missing data is also properly encoded in this…
0 runs0 likes1 downloads1 reach0 impact
199523 instances - 42 features - 2 classes - 415717 missing values
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent…
0 runs0 likes2 downloads2 reach3 impact
17379 instances - 13 features - 0 classes - 0 missing values
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent…
0 runs0 likes1 downloads1 reach3 impact
17379 instances - 13 features - 0 classes - 0 missing values
Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM).
0 runs0 likes0 downloads0 reach0 impact
1643 instances - 3 features - classes - 0 missing values
Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM).
0 runs0 likes0 downloads0 reach0 impact
1624 instances - 3 features - classes - 0 missing values
Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM).
0 runs0 likes0 downloads0 reach0 impact
1538 instances - 3 features - classes - 0 missing values
Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM).
0 runs0 likes0 downloads0 reach0 impact
1643 instances - 2 features - classes - 0 missing values
https://archive.ics.uci.edu/ml/datasets/Diabetes
0 runs0 likes1 downloads1 reach0 impact
768 instances - 9 features - classes - 0 missing values
Wikidata with top-474 most frequent types and ingoing/outgoing properties as features
0 runs0 likes15 downloads15 reach11 impact
19254100 instances - 2331 features - classes - 0 missing values
Data Set Information: The data has been produced using Monte Carlo simulations. The first 21 features (columns 2-22) are kinematic properties measured by the particle detectors in the accelerator. The…
0 runs1 likes5 downloads6 reach16 impact
98050 instances - 29 features - 0 classes - 9 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes16 downloads18 reach16 impact
101766 instances - 50 features - 3 classes - 0 missing values
DBpedia with top-474 most frequent YAGO types HMC dataset for type prediction. Ingoing and outgoing properties as features
0 runs0 likes3 downloads3 reach11 impact
2886305 instances - 2401 features - classes - 0 missing values
This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). *…
0 runs0 likes5 downloads5 reach12 impact
39644 instances - 61 features - 0 classes - 0 missing values
Predicting the Geographical Origin of Music, ICDM, 2014 Abstract: Instances in this dataset contain audio features extracted from 1059 wave files. The task associated with the data is to predict the…
0 runs0 likes4 downloads4 reach14 impact
1059 instances - 118 features - 0 classes - 0 missing values
Test dataset
0 runs0 likes2 downloads2 reach13 impact
15547 instances - 61 features - 0 classes - 280 missing values
This is an experimental data set for trying to classify numbers in a lottery as "Highly likely to be picked" or "Not very likely to be picked". It is based on a little more than a…
0 runs0 likes0 downloads0 reach0 impact
12528 instances - 36 features - classes - 0 missing values
ARFF Training Data
0 runs0 likes0 downloads0 reach0 impact
177640 instances - 40 features - classes - 0 missing values
This is the full version of the KDD Cup 2009 dataset Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large…
0 runs0 likes0 downloads0 reach0 impact
50000 instances - 14892 features - 2 classes - 19658569 missing values
source: http://plato.asu.edu/ftp/solvable.html authors: Rolf-David Bergdoll PAR10 performances of modern solvers on the solvable instances of MIPLIB2010. http://miplib.zib.de/ The algorithm runtime…
0 runs0 likes0 downloads0 reach0 impact
1090 instances - 145 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1090 instances - 147 features - 0 classes - 0 missing values
Source: Creators : François Kawala (1,2) Ahlame Douzal (1) Eric Gaussier (1) Eustache Diemert (2) Institutions : (1) Université Joseph Fourier (Grenoble I) Laboratoire d'informatique de…
0 runs0 likes1 downloads1 reach11 impact
28179 instances - 97 features - classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes1 downloads1 reach13 impact
80 instances - 113 features - 0 classes - 0 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective accident
0 runs0 likes0 downloads0 reach0 impact
363206 instances - 66 features - 0 classes - 876555 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes1 downloads1 reach13 impact
442 instances - 11 features - 0 classes - 0 missing values
this is titanic survival prediction
0 runs0 likes5 downloads5 reach7 impact
891 instances - 8 features - 0 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes1 downloads1 reach9 impact
156 instances - 81 features - 2 classes - 0 missing values
This collection includes data sets of one-dimensional ultrasound raw RF data (A-Scans) acquired from the biceps brachii muscles of a single healthy volunteer. The annotation was performed by labeling…
0 runs0 likes0 downloads0 reach0 impact
318 instances - 8 features - classes - 0 missing values
This collection includes data sets of one-dimensional ultrasound raw RF data (A-Scans) acquired from the biceps brachii muscles of 21 healthy volunteers. The annotation was performed by labeling the…
0 runs0 likes0 downloads0 reach0 impact
347 instances - 8 features - classes - 0 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - classes - 0 missing values
Source: The dataset was created by Athanasios Tsanas (tsanasthanasis '@' gmail.com) and Max Little (littlem '@' physics.ox.ac.uk) of the University of Oxford, in collaboration with 10 medical centers…
0 runs1 likes2 downloads3 reach11 impact
5875 instances - 22 features - classes - 0 missing values
Abstract: This data-set contains examples of buzz events from two different social networks: Twitter, and Tom's Hardware, a forum network focusing on new technology with more conservative dynamics.…
0 runs0 likes0 downloads0 reach13 impact
583250 instances - 78 features - 0 classes - 0 missing values
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records…
0 runs0 likes0 downloads0 reach0 impact
86467 instances - 67 features - 0 classes - 2852906 missing values
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records…
0 runs0 likes0 downloads0 reach0 impact
83943 instances - 67 features - 0 classes - 2801627 missing values
The dataset consists of 384 features extracted from CT images. The class variable is numeric and denotes the relative location of the CT slice on the axial axis of the human body. The data was…
0 runs0 likes1 downloads1 reach0 impact
53500 instances - 386 features - classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Tumor-size treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using…
0 runs0 likes3 downloads3 reach12 impact
286 instances - 10 features - 0 classes - 9 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
0 runs0 likes9 downloads9 reach13 impact
226 instances - 24 features - 2 classes - 0 missing values
Source: Original Owner: U.S. Census Bureau http://www.census.gov/ United States Department of Commerce Donor: Terran Lane and Ronny Kohavi Data Mining and Visualization Silicon Graphics. terran '@'…
0 runs1 likes9 downloads10 reach15 impact
299285 instances - 42 features - classes - 0 missing values
Abstract: CART book's waveform domains Source: Original Owners: Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group:…
0 runs2 likes6 downloads8 reach11 impact
5000 instances - 22 features - classes - 0 missing values
Modified version for the automl benchmark. Regroups information for about 7800 different US colleges. Including geographical information, stats about the population attending and post graduation…
0 runs0 likes0 downloads0 reach0 impact
7063 instances - 45 features - 0 classes - 104249 missing values
kaggle 30day ml
0 runs0 likes0 downloads0 reach0 impact
300000 instances - 25 features - 0 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach17 impact
425240 instances - 79 features - 2 classes - 2734000 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach17 impact
416188 instances - 61 features - 355 classes - 0 missing values
asdfasd
0 runs0 likes0 downloads0 reach0 impact
761 instances - 42 features - classes - 630 missing values
artificial with anomaly
0 runs0 likes0 downloads0 reach0 impact
4032 instances - 3 features - classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 0.1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
0 runs0 likes1 downloads1 reach0 impact
39948 instances - 10 features - 2 classes - 0 missing values
General Description 2015-current: greater than $200.00. The Commission categorizes contributions from individuals using the calendar year-to-date amount for political action committee (PAC) and party…
0 runs1 likes2 downloads3 reach8 impact
3348209 instances - 21 features - 0 classes - 10786577 missing values