OpenML
Filter results by:
User profile data for San Francisco OkCupid users published in [Kim, A. Y., & Escobedo-Land, A. (2015). OKCupid data for introductory statistics and data science courses. Journal of Statistics…
0 runs0 likes0 downloads0 reach1 impact
50789 instances - 20 features - 3 classes - 154107 missing values
This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.
0 runs0 likes0 downloads0 reach1 impact
39948 instances - 12 features - 2 classes - 0 missing values
SK daily COVID19
0 runs0 likes0 downloads0 reach0 impact
280 instances - 7 features - classes - 0 missing values
According to Epsilon research, 80% of customers are more likely to do business with you if you provide personalized service. Banking is no exception. The digitalization of everyday lives means that…
0 runs0 likes1 downloads1 reach7 impact
4459 instances - 4992 features - 0 classes - 0 missing values
Since the first automobile, the Benz Patent Motor Car in 1886, Mercedes-Benz has stood for important automotive innovations. These include, for example, the passenger safety cell with crumple zone,…
0 runs0 likes0 downloads0 reach8 impact
4209 instances - 377 features - 0 classes - 0 missing values
AutoML challenge 2014. Original task: regression. Test and validation sets can be obtained on the Cha Learn website: https://automl.chalearn.org/data
0 runs0 likes0 downloads0 reach4 impact
400000 instances - 101 features - 0 classes - 0 missing values
Abstract: This data-set contains examples of buzz events from two different social networks: Twitter, and Tom's Hardware, a forum network focusing on new technology with more conservative dynamics.…
0 runs0 likes0 downloads0 reach13 impact
583250 instances - 78 features - 0 classes - 0 missing values
When you've been devastated by a serious car accident, your focus is on the things that matter the most: family, friends, and other loved ones. Pushing paper with your insurance agent is the last…
0 runs0 likes0 downloads0 reach8 impact
188318 instances - 131 features - 0 classes - 0 missing values
File README ----------- smoothmeth A collection of the data sets used in the book "Smoothing Methods in Statistics," by Jeffrey S. Simonoff, Springer-Verlag, New York, 1996. Submitted by Jeff Simonoff…
0 runs0 likes0 downloads0 reach15 impact
2178 instances - 4 features - 0 classes - 0 missing values
This classic dataset contains the prices and other attributes of almost 54,000 diamonds. It's a great dataset for beginners learning to work with data analysis and visualization. Content price price…
0 runs0 likes1 downloads1 reach9 impact
53940 instances - 10 features - 0 classes - 0 missing values
source: http://plato.asu.edu/ftp/solvable.html authors: Rolf-David Bergdoll PAR10 performances of modern solvers on the solvable instances of MIPLIB2010. http://miplib.zib.de/ The algorithm runtime…
0 runs0 likes0 downloads0 reach10 impact
1090 instances - 148 features - 0 classes - 0 missing values
source: http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/ authors: L. Xu, F. Hutter, H. Hoos, K. Leyton-Brown translator in coseal format: M. Lindauer with the help of Alexandre Frechette the data do…
0 runs0 likes1 downloads1 reach9 impact
4440 instances - 117 features - 0 classes - 27150 missing values
Ignores community name.**Author**: Title: Communities and Crime Abstract: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from…
0 runs0 likes0 downloads0 reach0 impact
1994 instances - 127 features - 0 classes - 39202 missing values
Modified version for the automl benchmark. Regroups information for about 7800 different US colleges. Including geographical information, stats about the population attending and post graduation…
0 runs0 likes0 downloads0 reach0 impact
7063 instances - 45 features - 0 classes - 104249 missing values
Make target (age) numeric**Author**: 1. Title of Database: Abalone data 2. Sources: (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of…
0 runs0 likes0 downloads0 reach1 impact
4177 instances - 9 features - 0 classes - 0 missing values
String datetime information extracted to numeric columns.Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC)…
0 runs0 likes0 downloads0 reach1 impact
581835 instances - 19 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs1 likes0 downloads1 reach15 impact
8885 instances - 267 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs1 likes0 downloads1 reach15 impact
8885 instances - 252 features - 0 classes - 0 missing values
This is the Tecator data set: The task is to predict the fat content of a meat sample on the basis of its near infrared absorbance spectrum. 1. Statement of permission from Tecator (the original data…
0 runs0 likes4 downloads4 reach14 impact
240 instances - 125 features - 0 classes - 0 missing values
Version with url set as row id, creator data missing due to bad formatting.**Author**: Kelwin Fernandes (INESC TEC, Universidade doPorto), Pedro Vinagre (ALGORITMI Research Centre, Universidade do…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 60 features - 0 classes - 0 missing values
This dataset contains 10962 houses to rent with 13 diferent features. Some values in the dataset can be considered as outliers for further analyses. Bear in mind that the Web Crawler was used only to…
0 runs0 likes0 downloads0 reach5 impact
10692 instances - 13 features - 0 classes - 0 missing values
This is the full version of the KDD Cup 2009 dataset Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large…
0 runs0 likes0 downloads0 reach0 impact
50000 instances - 15001 features - 2 classes - 14616450 missing values
This is the full version of the KDD Cup 2009 dataset Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large…
0 runs0 likes0 downloads0 reach0 impact
50000 instances - 15001 features - 2 classes - 14616450 missing values
Multi-label dataset for text-classification. It consists of article titles and partial blurbs. Blurbs can be assigned to several categories (e.g. Science, News, Games) based on word predictors.
0 runs1 likes15 downloads16 reach16 impact
3782 instances - 1101 features - 2 classes - 0 missing values
"The sulfur recovery unit (SRU) removes environmental pollutants from acid gas streams before they are released into the atmosphere. Furthermore, elemental sulfur is recovered as a valuable…
0 runs0 likes2 downloads2 reach12 impact
10081 instances - 7 features - 0 classes - 0 missing values
This is the full version of the KDD Cup 2009 dataset Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large…
0 runs0 likes0 downloads0 reach0 impact
50000 instances - 15001 features - 2 classes - 14616450 missing values
Data
0 runs0 likes1 downloads1 reach10 impact
539383 instances - 8 features - 2 classes - 0 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs0 likes1 downloads1 reach7 impact
538638 instances - 7 features - 2 classes - 0 missing values
In the early 2000s, Billy Beane and Paul DePodesta worked for the Oakland Athletics. While there, they literally changed the game of baseball. They didn't do it using a bat or glove, and they…
0 runs0 likes8 downloads8 reach13 impact
1232 instances - 15 features - 0 classes - 3600 missing values
Israeli lottery
0 runs0 likes1 downloads1 reach8 impact
1153 instances - 11 features - classes - 0 missing values
50 Danish words with their pronunciation from Dansk Ordbog
0 runs0 likes0 downloads0 reach0 impact
51 instances - 2 features - classes - 2 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach16 impact
425240 instances - 79 features - 2 classes - 2734000 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach16 impact
416188 instances - 61 features - 355 classes - 0 missing values
Abstract: CART book's waveform domains Source: Original Owners: Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group:…
0 runs2 likes6 downloads8 reach11 impact
5000 instances - 22 features - classes - 0 missing values
this is titanic survival prediction
0 runs0 likes3 downloads3 reach7 impact
891 instances - 8 features - 0 classes - 0 missing values
titanic surviual prediction
0 runs0 likes1 downloads1 reach8 impact
891 instances - 8 features - 0 classes - 0 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective accident
0 runs0 likes2 downloads2 reach0 impact
Outliers data set extracted from the Illustration (Fig. 3) in "Novelty detection with application to data streams"
0 runs0 likes0 downloads0 reach0 impact
75 instances - 3 features - 4 classes - 0 missing values
Subset of KITS dataset with 100 images and nominal target
0 runs0 likes0 downloads0 reach0 impact
100 instances - 27649 features - 2 classes - 0 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure. For this…
0 runs0 likes2 downloads2 reach6 impact
26969 instances - 8 features - 2 classes - 0 missing values
Data from https://doi.org/10.5281/zenodo.269636
0 runs0 likes5 downloads5 reach14 impact
4758 instances - 39 features - classes - 0 missing values
testing
0 runs0 likes0 downloads0 reach0 impact
3279 instances - 1559 features - classes - 0 missing values
service data
0 runs0 likes0 downloads0 reach0 impact
34 instances - 8 features - classes - 0 missing values
tesl dataset about L
0 runs0 likes0 downloads0 reach0 impact
150000 instances - 8 features - classes - 0 missing values
Multi-label dataset. The scene dataset is an image classification task where labels like Beach, Mountain, Field, Urban are assigned to each image.
0 runs0 likes13 downloads13 reach11 impact
2407 instances - 300 features - 2 classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs0 likes3 downloads3 reach9 impact
2000 instances - 140 features - classes - 0 missing values
test
0 runs0 likes1 downloads1 reach6 impact
1000 instances - 21 features - classes - 0 missing values
test
0 runs0 likes1 downloads1 reach6 impact
1000 instances - 21 features - classes - 0 missing values
test
0 runs0 likes1 downloads1 reach6 impact
1000 instances - 21 features - classes - 0 missing values
test
0 runs0 likes1 downloads1 reach3 impact
1000 instances - 21 features - classes - 0 missing values
Water stress dataset for Indian variety of wheat crop: The data consist of a file system-based data of Raj 3765 variety of wheat. There are twenty-four chlorophyll fluorescence images captured every…
0 runs0 likes2 downloads2 reach7 impact
1188 instances - 23 features - 0 classes - 0 missing values
The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of…
0 runs2 likes32 downloads34 reach12 impact
1309 instances - 14 features - 2 classes - 3855 missing values
#test data for mlp
0 runs0 likes0 downloads0 reach0 impact
200 instances - 12 features - classes - 0 missing values
These weekly averages are ultimately based on measurements of 4 air samples per hour taken atop intake lines on several towers during steady periods of CO2 concentration of not less than 6 hours per…
0 runs1 likes2 downloads3 reach10 impact
2225 instances - 7 features - 0 classes - 0 missing values
this is titanic survival prediction
0 runs0 likes4 downloads4 reach7 impact
891 instances - 8 features - 0 classes - 0 missing values
PM 2.5 datasetd
0 runs0 likes0 downloads0 reach0 impact
43800 instances - 10 features - classes - 0 missing values
titanic surviual prediction
0 runs0 likes2 downloads2 reach7 impact
891 instances - 8 features - 0 classes - 0 missing values
titanic surviual prediction
0 runs0 likes3 downloads3 reach7 impact
891 instances - 8 features - 0 classes - 0 missing values
The proposed forecasting approach is tested by using the database from UCI machine learning repository. Using a Deep Learning Model Based on 1D Convnets and Bidirectional GRU
0 runs0 likes1 downloads1 reach9 impact
43800 instances - 10 features - classes - 0 missing values
Source: Original Owner: U.S. Census Bureau http://www.census.gov/ United States Department of Commerce Donor: Terran Lane and Ronny Kohavi Data Mining and Visualization Silicon Graphics. terran '@'…
0 runs1 likes8 downloads9 reach15 impact
299285 instances - 42 features - classes - 0 missing values
libSVM","AAD group Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biology, 96:6745-6750, 1999. #Dataset from…
0 runs0 likes7 downloads7 reach16 impact
62 instances - 2001 features - 0 classes - 0 missing values
Survey to know if people self-identify as Midwesterners.
0 runs0 likes0 downloads0 reach0 impact
2778 instances - 28 features - 10 classes - 1737 missing values
Survey to know if people self-identify as Midwesterners.
0 runs0 likes0 downloads0 reach0 impact
2494 instances - 28 features - 9 classes - 99 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective accident.…
0 runs0 likes0 downloads0 reach0 impact
363243 instances - 67 features - 3 classes - 2181757 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes2 downloads2 reach13 impact
150 instances - 5 features - 0 classes - 0 missing values
Product listing data submitted to the U.S. FDA for all unfinished, unapproved drugs.
0 runs0 likes0 downloads0 reach0 impact
120215 instances - 20 features - 7 classes - 443305 missing values
https://www.kaggle.com/dansbecker/nba-shot-logs
0 runs0 likes0 downloads0 reach0 impact
128069 instances - 21 features - classes - 5567 missing values
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The…
0 runs1 likes2 downloads3 reach8 impact
284807 instances - 31 features - 2 classes - 0 missing values
Rows with NaN and inf values removed. Converted file format from CSV to ARFF.
0 runs0 likes1 downloads1 reach4 impact
18982 instances - 80 features - 5 classes - 0 missing values
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The…
0 runs1 likes7 downloads8 reach8 impact
284807 instances - 31 features - 0 classes - 0 missing values
test data test
0 runs0 likes1 downloads1 reach2 impact
2 instances - 5 features - classes - 0 missing values
# Data Description This is the historical price data of the FOREX EUR/HUF from Dukascopy. One instance (row) is one candlestick of one hour. The whole dataset has the data range from 1-1-2018 to…
0 runs1 likes1 downloads2 reach8 impact
43825 instances - 12 features - 2 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes2 downloads2 reach16 impact
32561 instances - 124 features - 0 classes - 0 missing values
Generated data from c algorithm to break the composition of primes.Into a unique 4 lined 2D object.
0 runs0 likes0 downloads0 reach0 impact
26 instances - 5 features - classes - 0 missing values
KITS dataset
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 27649 features - 2 classes - 0 missing values
Data Sets for 'Regression Models for Time Series Analysis' by B. Kedem and K. Fokianos, Wiley 2002. Submitted by Kostas Fokianos (fokianos@ucy.ac.cy) [8/Nov/02] (176k) description taken from this web…
0 runs0 likes1 downloads1 reach14 impact
508 instances - 11 features - 0 classes - 0 missing values
File README ----------- ### chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey…
0 runs0 likes0 downloads0 reach13 impact
400 instances - 7 features - 0 classes - 0 missing values
Contains 110 data sets from the book 'The Statistical Sleuth' by Fred Ramsey and Dan Schafer; Duxbury Press, 1997. (schafer@stat.orst.edu) [14/Oct/97] (172k) description taken from this web site:…
0 runs0 likes0 downloads0 reach13 impact
34 instances - 9 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes1 downloads1 reach13 impact
120 instances - 3 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes0 downloads0 reach13 impact
66 instances - 6 features - 0 classes - 0 missing values
This S dump contains 22 data sets from the book Visualizing Data published by Hobart Press (books@hobart.com). The dump was created by data.dump() and can be read back into S by data.restore(). The…
0 runs0 likes0 downloads0 reach13 impact
73 instances - 6 features - 0 classes - 0 missing values
This S dump contains 22 data sets from the book Visualizing Data published by Hobart Press (books@hobart.com). The dump was created by data.dump() and can be read back into S by data.restore(). The…
0 runs0 likes2 downloads2 reach13 impact
88 instances - 3 features - 0 classes - 0 missing values
This software can be freely used for non-commercial purposes and can be freely distributed. Readme file =========== The data sets in this directory are taken from the above book. The data are…
0 runs0 likes0 downloads0 reach13 impact
42 instances - 16 features - 0 classes - 0 missing values
chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S. Simonoff, John Wiley and…
0 runs0 likes0 downloads0 reach13 impact
52 instances - 10 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes0 downloads0 reach13 impact
46 instances - 4 features - 0 classes - 0 missing values
chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S. Simonoff, John Wiley and…
0 runs0 likes0 downloads0 reach13 impact
468 instances - 3 features - 0 classes - 0 missing values
This file is a text file giving details about the time series analysed in 'The Analysis of Time Series' by Chris Chatfield. The 5th edn was published in 1996 and the 6th edn in 2003. The series are…
0 runs0 likes0 downloads0 reach13 impact
235 instances - 13 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes0 downloads0 reach13 impact
70 instances - 4 features - 0 classes - 0 missing values
UCI
0 runs0 likes0 downloads0 reach0 impact
41188 instances - 21 features - classes - 0 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes15 downloads17 reach16 impact
101766 instances - 50 features - 3 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes1 downloads1 reach13 impact
40 instances - 7 features - 0 classes - 3 missing values
Cicchetti, D.\ Data from which conclusions were drawn in the article "Sleep in Mammals: Ecological and Constitutional Correlates" by Allison, T. and Cicchetti, D. (1976), _Science_, November 12, vol.…
0 runs0 likes1 downloads1 reach9 impact
62 instances - 8 features - 0 classes - 12 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
14 instances - 51 features - 0 classes - 0 missing values
Date converted to year/mo/day numerics.This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It contains 19 house…
0 runs0 likes1 downloads1 reach1 impact
21613 instances - 22 features - 0 classes - 0 missing values
NASA
0 runs0 likes0 downloads0 reach0 impact
45918 instances - 22 features - 0 classes - 0 missing values
NASA
0 runs0 likes0 downloads0 reach0 impact
45918 instances - 22 features - 0 classes - 0 missing values
Airlines Departure Delay Prediction (Regression). Original data can be found at: http://www.transtats.bts.gov This is a processed version of the original data, designed to predict departure delay (in…
0 runs0 likes1 downloads1 reach2 impact
1000000 instances - 10 features - 0 classes - 0 missing values
### Internet Usage Data #### Data Type multivariate #### Abstract This data contains general demographic information on internet users in 1997. ### Data Characteristics This data comes from a survey…
0 runs1 likes6 downloads7 reach12 impact
10108 instances - 72 features - 46 classes - 2699 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
100 instances - 26 features - 0 classes - 0 missing values