OpenML
Filter results by:
wind daily average wind speeds for 1961-1978 at 12 synoptic meteorological stations in the Republic of Ireland (Haslett and raftery 1989). These data were analyzed in detail in the following article:…
0 runs0 likes6 downloads6 reach13 impact
6574 instances - 15 features - 0 classes - 0 missing values
No data.
27 runs1 likes3 downloads4 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
29 runs0 likes4 downloads4 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs0 likes2 downloads2 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs1 likes4 downloads5 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
32 runs0 likes1 downloads1 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
28 runs0 likes3 downloads3 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
41 runs0 likes2 downloads2 reach15 impact
1340 instances - 17 features - 3 classes - 20 missing values
Database of baseball players and play statistics, including 'Games_played', 'At_bats', 'Runs', 'Hits', 'Doubles', 'Triples', 'Home_runs', 'RBIs', 'Walks', 'Strikeouts', 'Batting_average',…
795 runs0 likes10 downloads10 reach10 impact
1340 instances - 17 features - 3 classes - 20 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
100 runs0 likes3 downloads3 reach14 impact
31 instances - 16 features - 2 classes - 150 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
790 runs0 likes10 downloads10 reach15 impact
159 instances - 16 features - 2 classes - 0 missing values
No data.
27 runs0 likes5 downloads5 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
0 runs0 likes1 downloads1 reach9 impact
1000000 instances - 16 features - 0 classes - 0 missing values
No data.
33 runs0 likes4 downloads4 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
29 runs0 likes6 downloads6 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses in use as…
3252 runs2 likes26 downloads28 reach10 impact
205 instances - 26 features - 6 classes - 59 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
114 runs0 likes5 downloads5 reach14 impact
42 instances - 16 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
754 runs0 likes10 downloads10 reach14 impact
60 instances - 16 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
727 runs0 likes5 downloads5 reach15 impact
205 instances - 26 features - 2 classes - 59 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
131 runs0 likes6 downloads6 reach15 impact
1340 instances - 17 features - 2 classes - 20 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
754 runs0 likes10 downloads10 reach16 impact
8844 instances - 57 features - 2 classes - 34843 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: scaled to [-1,1]
0 runs0 likes6 downloads6 reach17 impact
270 instances - 14 features - 0 classes - 0 missing values
* Title of Database: Spoken Arabic Digit * Abstract: This dataset contains time series of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 males…
1 runs0 likes8 downloads8 reach15 impact
263256 instances - 15 features - 10 classes - 0 missing values
All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement…
165839 runs3 likes94 downloads97 reach29 impact
14980 instances - 15 features - 2 classes - 0 missing values
No data.
51 runs0 likes2 downloads2 reach9 impact
1000000 instances - 15 features - 2 classes - 0 missing values
Re-upload of the dataset as it is present in the Penn ML Benchmark (https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/fars). It's a dataset on traffic accidents,…
1 runs0 likes3 downloads3 reach22 impact
100968 instances - 30 features - 8 classes - 0 missing values
Data on the homicide rate in Detroit for the years 1961-1973. This is the data set called DETROIT in the book 'Subset selection in regression' by Alan J. Miller published in the Chapman & Hall series…
0 runs0 likes0 downloads0 reach13 impact
13 instances - 14 features - 0 classes - 0 missing values
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service…
216 runs0 likes11 downloads11 reach11 impact
110393 instances - 55 features - 7 classes - 0 missing values
Data from StatLib (ftp stat.cmu.edu/datasets) This is the data set called `DETROIT' in the book `Subset selection in regression' by Alan J. Miller published in the Chapman & Hall series of monographs…
2 runs0 likes0 downloads0 reach10 impact
13 instances - 14 features - 0 classes - 0 missing values
------------------------------------------------------------------------ Primary Biliary Cirrhosis The data set found in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis,…
18 runs1 likes3 downloads4 reach14 impact
418 instances - 20 features - 0 classes - 1033 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes1 downloads1 reach13 impact
47 instances - 14 features - 0 classes - 0 missing values
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…
3 runs0 likes3 downloads3 reach13 impact
72983 instances - 33 features - 2 classes - 149271 missing values
No data.
288 runs0 likes2 downloads2 reach10 impact
1000000 instances - 15 features - 9 classes - 0 missing values
This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers. The data was collected for examining our newly developed classifier for multidimensional curves…
23157 runs0 likes12 downloads12 reach54 impact
9961 instances - 15 features - 9 classes - 0 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
27229 runs0 likes11 downloads11 reach10 impact
736 instances - 20 features - 5 classes - 448 missing values
County data from the 2000 Presidential Election in Florida. Compiled by Brett Presnell Department of Statistics, University of Florida These data are derived from three sources, described below. As…
32 runs0 likes4 downloads4 reach14 impact
67 instances - 16 features - 5 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
769 runs0 likes12 downloads12 reach15 impact
252 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
102 runs0 likes4 downloads4 reach14 impact
67 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
718 runs0 likes6 downloads6 reach15 impact
159 instances - 16 features - 2 classes - 0 missing values
No data.
117 runs0 likes4 downloads4 reach9 impact
1000000 instances - 20 features - 5 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes13 downloads13 reach15 impact
6574 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
700 runs0 likes5 downloads5 reach14 impact
67 instances - 16 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
857 runs0 likes13 downloads13 reach17 impact
9961 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
701 runs0 likes3 downloads3 reach15 impact
736 instances - 20 features - 2 classes - 448 missing values
* Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779 * Data Set Information: This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In…
159 runs1 likes4 downloads5 reach13 impact
200 instances - 14 features - 5 classes - 0 missing values
* Dataset: This is a reprocessed version of heart-h (hungarian), the heart disease reprocessed hungarian dataset from UCI.
138 runs0 likes7 downloads7 reach13 impact
294 instances - 14 features - 5 classes - 0 missing values
No data.
253 runs0 likes9 downloads9 reach9 impact
1076790 instances - 30 features - 2 classes - 7275 missing values
No data.
0 runs0 likes0 downloads0 reach9 impact
1000000 instances - 13 features - 0 classes - 0 missing values
Abstract: This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 male and 44 female native Arabic speakers.…
0 runs0 likes3 downloads3 reach11 impact
178526 instances - 13 features - classes - 57200 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes14 downloads16 reach16 impact
101766 instances - 50 features - 3 classes - 0 missing values
Source: Original Owner: U.S. Census Bureau http://www.census.gov/ United States Department of Commerce Donor: Terran Lane and Ronny Kohavi Data Mining and Visualization Silicon Graphics. terran '@'…
0 runs1 likes6 downloads7 reach15 impact
299285 instances - 42 features - classes - 0 missing values
Zurich public transport delay data 2016-10-30 03:30:00 CET - 2016-11-27 01:20:00 CET cleaned and prepared at Open Data Day 2017. For this version, the task was downsampled to 0.5 percent. Some…
0 runs0 likes0 downloads0 reach6 impact
27327 instances - 18 features - 0 classes - 657 missing values
hmeq_p,BAD,binary
0 runs0 likes0 downloads0 reach8 impact
5960 instances - 15 features - classes - 5271 missing values
URL dataset 2
0 runs0 likes0 downloads0 reach2 impact
95911 instances - 13 features - 0 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach3 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach3 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach3 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach3 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach3 impact
270 instances - 14 features - classes - 0 missing values
Primary Biliary Cirrhosis This data set is a follow-up to the original PBC data set, as discussed in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991. An…
0 runs0 likes5 downloads5 reach13 impact
1945 instances - 19 features - 0 classes - 1133 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
5 runs1 likes2 downloads3 reach9 impact
8192 instances - 13 features - 0 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
12 runs0 likes4 downloads4 reach14 impact
8192 instances - 13 features - 0 classes - 0 missing values
Conventional and Social Media Movies (CSM) - Dataset 2014 and 2015 Data Set 12 features categorized as conventional and social media features. Both conventional features, collected from movies…
0 runs0 likes0 downloads0 reach7 impact
232 instances - 14 features - classes - 60 missing values
Conventional and Social Media Movies (CSM) - Dataset 2014 and 2015 Data Set 12 features categorized as conventional and social media features. Both conventional features, collected from movies…
0 runs0 likes0 downloads0 reach8 impact
231 instances - 13 features - classes - 46 missing values
No data.
326 runs0 likes4 downloads4 reach11 impact
1000000 instances - 14 features - 2 classes - 0 missing values
This file is a text file giving details about the time series analysed in 'The Analysis of Time Series' by Chris Chatfield. The 5th edn was published in 1996 and the 6th edn in 2003. The series are…
0 runs0 likes0 downloads0 reach13 impact
235 instances - 13 features - 0 classes - 0 missing values
No data.
312 runs1 likes5 downloads6 reach12 impact
1000000 instances - 14 features - 3 classes - 0 missing values
This database contains 13 attributes (which have been extracted from a larger set of 75) Attribute Information: ------------------------ -- 1. age -- 2. sex -- 3. chest pain type (4 values) -- 4.…
3214 runs0 likes19 downloads19 reach11 impact
270 instances - 14 features - 2 classes - 0 missing values
1. Title of Database: Wine recognition data Updated Sept 21, 1998 by C.Blake : Added attribute information 2. Sources: (a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration,…
1187 runs1 likes20 downloads21 reach11 impact
178 instances - 14 features - 3 classes - 0 missing values
The AAUP dataset for the ASA Statistical Graphics Section's 1995 Data Analysis Exposition contains information on faculty salaries for 1161 American colleges and universities. The data may be obtained…
32 runs0 likes3 downloads3 reach14 impact
1161 instances - 15 features - 4 classes - 256 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
101 runs0 likes5 downloads5 reach15 impact
1161 instances - 16 features - 2 classes - 256 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
780 runs0 likes9 downloads9 reach15 impact
178 instances - 14 features - 2 classes - 0 missing values
calendarDOW-pmlb
31 runs0 likes1 downloads1 reach20 impact
399 instances - 33 features - 5 classes - 0 missing values
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch,…
6 runs0 likes5 downloads5 reach18 impact
506 instances - 14 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach14 impact
66 instances - 12 features - 0 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
2 runs0 likes0 downloads0 reach12 impact
595212 instances - 38 features - 2 classes - 846458 missing values
This dataset reflects incidents of crime in the City of Los Angeles dating back to 2010. This data is transcribed from original crime reports that are typed on paper and therefore there may be some…
0 runs0 likes0 downloads0 reach8 impact
1468825 instances - 26 features - 0 classes - 7881776 missing values
Test file for ML training
0 runs0 likes0 downloads0 reach9 impact
1599 instances - 12 features - classes - 0 missing values
1. Title: Wine Quality 2. Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 3. Past Usage: P. Cortez, A. Cerdeira, F.…
0 runs1 likes13 downloads14 reach14 impact
6497 instances - 12 features - 0 classes - 0 missing values
* Title: Planning Relax Data Set * Abstract: The dataset concerns with the classification of two mental stages from recorded EEG signals: Planning (during imagination of motor act) and Relax state. *…
141 runs0 likes8 downloads8 reach14 impact
182 instances - 13 features - 2 classes - 0 missing values
* Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779 * Data Set Information: This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In…
170 runs0 likes9 downloads9 reach13 impact
123 instances - 13 features - 5 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach9 impact
531441 instances - 12 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach11 impact
163 instances - 27 features - 5 classes - 9 missing values
Schizophrenic Eye-Tracking Data in Rubin and Wu (1997) Biometrics. Yingnian Wu (wu@hustat.harvard.edu) [14/Oct/97] Information about the dataset CLASSTYPE: nominal CLASSINDEX: last
748 runs0 likes7 downloads7 reach23 impact
340 instances - 15 features - 2 classes - 834 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
753 runs0 likes11 downloads11 reach15 impact
8192 instances - 13 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
810 runs0 likes8 downloads8 reach15 impact
235 instances - 13 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
680 runs0 likes5 downloads5 reach15 impact
1945 instances - 19 features - 2 classes - 1133 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
734 runs0 likes10 downloads10 reach15 impact
506 instances - 14 features - 2 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
250 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
100 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach13 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
1 runs0 likes2 downloads2 reach13 impact
1000 instances - 11 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
1 runs0 likes2 downloads2 reach16 impact
1025010 instances - 11 features - 0 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
2 runs0 likes2 downloads2 reach15 impact
53 instances - 12 features - 0 classes - 0 missing values
Author: Gregory Gay, Tim Menzies, Misty Davies, Karen Gundy-Burlet Source: [Zenodo](https://zenodo.org/record/322475) Please cite: Misty Davies. (2009). bike [Data set]. Zenodo. DOI:…
0 runs0 likes4 downloads4 reach10 impact
4435 instances - 11 features - classes - 0 missing values