OpenML
Filter results by:
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Electrical Discharge Machining dataset (Karalic and Bratko 1997) represents a two-target…
0 runs0 likes0 downloads0 reach1 impact
154 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Jura (Goovaerts 1997) dataset consists of measurements of concentrations of seven heavy…
0 runs0 likes0 downloads0 reach1 impact
359 instances - 18 features - classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
752 runs0 likes7 downloads7 reach7 impact
339 instances - 18 features - 2 classes - 225 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
41 runs0 likes2 downloads2 reach7 impact
1340 instances - 18 features - 3 classes - 20 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
176 runs0 likes6 downloads6 reach6 impact
101 instances - 18 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
131 runs0 likes6 downloads6 reach7 impact
1340 instances - 18 features - 2 classes - 20 missing values
Database of baseball players and play statistics, including 'Games_played', 'At_bats', 'Runs', 'Hits', 'Doubles', 'Triples', 'Home_runs', 'RBIs', 'Walks', 'Strikeouts', 'Batting_average',…
795 runs0 likes10 downloads10 reach2 impact
1340 instances - 18 features - 3 classes - 20 missing values
Citation Request: This primary tumor domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1261 runs0 likes14 downloads14 reach2 impact
339 instances - 18 features - 21 classes - 225 missing values
Graeme D. Hutcheson and Nick Sofroniou 1999 The Multivariate Social Scientist: Introductory Statistics Using Generalized Linear Models. SAGE Publications. Copyright: Graeme D. Hutcheson & Nick…
0 runs0 likes0 downloads0 reach5 impact
42 instances - 17 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 17 features - classes - 0 missing values
This database was designed on the basis of data provided by US Census Bureau [http://www.census.gov] (under Lookup Access [http://www.census.gov/cdrom/lookup]: Summary Tape File 1). The data were…
0 runs1 likes6 downloads7 reach5 impact
22784 instances - 17 features - 0 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
2 runs0 likes0 downloads0 reach5 impact
59 instances - 17 features - 0 classes - 0 missing values
No data.
28 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
32 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
31 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
34 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
68 runs0 likes3 downloads3 reach1 impact
20000 instances - 17 features - 3 classes - 10000 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
2 runs0 likes2 downloads2 reach5 impact
60 instances - 17 features - 0 classes - 0 missing values
Abstract: The data set is composed of 60 chorales (5665 events) by J.S. Bach (1675-1750). Each event of each chorale is labelled using 1 among 101 chord labels and described through 14 features.…
31 runs0 likes2 downloads2 reach5 impact
5665 instances - 17 features - 102 classes - 0 missing values
The data was collected retrospectively at Wroclaw Thoracic Surgery Centre for patients who underwent major lung resections for primary lung cancer in the years 2007 - 2011. The Centre is associated…
31 runs0 likes4 downloads4 reach4 impact
470 instances - 17 features - 2 classes - 0 missing values
No data.
67 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 10 classes - 0 missing values
No data.
60 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
332 runs0 likes4 downloads4 reach2 impact
1000000 instances - 17 features - 2 classes - 0 missing values
No data.
311 runs0 likes3 downloads3 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
293 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 10 classes - 0 missing values
uci adult partitioned
0 runs0 likes0 downloads0 reach0 impact
48844 instances - 17 features - classes - 6495 missing values
No data.
71 runs0 likes5 downloads5 reach2 impact
1000000 instances - 17 features - 2 classes - 0 missing values
No data.
356 runs0 likes7 downloads7 reach1 impact
131072 instances - 17 features - 2 classes - 0 missing values
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was…
64653 runs2 likes28 downloads30 reach20 impact
45211 instances - 17 features - 2 classes - 0 missing values
#study_1
0 runs0 likes0 downloads0 reach2 impact
944 instances - 17 features - classes - 0 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable,…
519 runs0 likes7 downloads7 reach6 impact
203 instances - 17 features - 11 classes - 0 missing values
* Dataset: Reduced version (10 % of the examples) of bank-marketing dataset.
104 runs1 likes16 downloads17 reach7 impact
4521 instances - 17 features - 2 classes - 0 missing values
* Title: Thoracic Surgery Data Data Set * Abstract: The data is dedicated to classification problem related to the post-operative life expectancy in the lung cancer patients: class 1 - death within…
145 runs0 likes6 downloads6 reach6 impact
470 instances - 17 features - 2 classes - 0 missing values
Date: Tue, 15 Nov 88 15:44:08 EST From: stan To: aha@ICS.UCI.EDU 1. Title: Final settlements in labor negotitions in Canadian industry 2. Source Information -- Creators:…
7681 runs0 likes16 downloads16 reach2 impact
57 instances - 17 features - 2 classes - 326 missing values
1. TITLE: Letter Image Recognition Data The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The…
67265 runs1 likes70 downloads71 reach2 impact
20000 instances - 17 features - 26 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
604 runs0 likes13 downloads13 reach7 impact
22784 instances - 17 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
112 runs0 likes5 downloads5 reach6 impact
42 instances - 17 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
676 runs0 likes13 downloads13 reach7 impact
10992 instances - 17 features - 2 classes - 0 missing values
We create a digit database by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by…
34685 runs0 likes19 downloads19 reach2 impact
10992 instances - 17 features - 10 classes - 0 missing values
The AAUP dataset for the ASA Statistical Graphics Section's 1995 Data Analysis Exposition contains information on faculty salaries for 1161 American colleges and universities. The data may be obtained…
32 runs0 likes3 downloads3 reach6 impact
1161 instances - 17 features - 4 classes - 256 missing values
County data from the 2000 Presidential Election in Florida. Compiled by Brett Presnell Department of Statistics, University of Florida These data are derived from three sources, described below. As…
32 runs0 likes4 downloads4 reach6 impact
67 instances - 17 features - 5 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
100 runs0 likes3 downloads3 reach6 impact
31 instances - 17 features - 2 classes - 150 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
101 runs0 likes5 downloads5 reach7 impact
1161 instances - 17 features - 2 classes - 256 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
639 runs0 likes12 downloads12 reach7 impact
20000 instances - 17 features - 2 classes - 0 missing values
A simple database containing 17 Boolean-valued attributes describing animals. The "type" attribute appears to be the class attribute. Notes: * I find it unusual that there are 2 instances of "frog"…
168 runs2 likes16 downloads18 reach1 impact
101 instances - 17 features - 7 classes - 0 missing values
1. Title: 1984 United States Congressional Voting Records Database 2. Source Information: (a) Source: Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional…
2262 runs0 likes17 downloads17 reach1 impact
435 instances - 17 features - 2 classes - 392 missing values
This dataset contains all Premier League matches, with player statistic take from Sofifa, from 2008 to 2016
0 runs0 likes0 downloads0 reach0 impact
2961 instances - 17 features - classes - 0 missing values
No data.
0 runs0 likes1 downloads1 reach1 impact
1000000 instances - 16 features - 0 classes - 0 missing values
This is the pollution data so loved by writers of papers on ridge regression. Source: McDonald, G.C. and Schwing, R.C. (1973) 'Instabilities of regression estimates relating air pollution to…
0 runs0 likes1 downloads1 reach5 impact
60 instances - 16 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 16 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach5 impact
67 instances - 16 features - 0 classes - 0 missing values
One of the data sets used in the book "Analyzing Categorical Data" by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. Further details concerning the book, including information on statistical…
0 runs0 likes1 downloads1 reach3 impact
31 instances - 16 features - classes - 150 missing values
No data.
73 runs0 likes5 downloads5 reach2 impact
1000000 instances - 16 features - 2 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics; (b) its assigned insurance risk rating,; (c) its normalized losses in use as…
7 runs1 likes4 downloads5 reach1 impact
159 instances - 16 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! All nominal attributes and instances with missing values are deleted. Price treated as the class attribute. As used by…
2 runs0 likes0 downloads0 reach1 impact
159 instances - 16 features - 0 classes - 0 missing values
No data.
326 runs0 likes4 downloads4 reach2 impact
1000000 instances - 16 features - 2 classes - 0 missing values
Abstract: This dataset consists in a collection of shape and texture features extracted from digital images of leaf specimens originating from a total of 40 different plant species. Source: This…
112 runs0 likes8 downloads8 reach5 impact
340 instances - 16 features - 30 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
700 runs0 likes5 downloads5 reach6 impact
67 instances - 16 features - 2 classes - 0 missing values
Dataset from the MLRR repository: http://axon.cs.byu.edu:5000/
68 runs0 likes7 downloads7 reach15 impact
32561 instances - 16 features - 2 classes - 4262 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
718 runs0 likes6 downloads6 reach7 impact
159 instances - 16 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
102 runs0 likes4 downloads4 reach6 impact
67 instances - 16 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
790 runs0 likes10 downloads10 reach7 impact
159 instances - 16 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
754 runs0 likes10 downloads10 reach6 impact
60 instances - 16 features - 2 classes - 0 missing values
This file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data. This dataset is interesting because…
24246 runs1 likes31 downloads32 reach2 impact
690 instances - 16 features - 2 classes - 67 missing values
Short Summary: Lists estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. Classroom use of this data set: This data set…
0 runs0 likes4 downloads4 reach7 impact
252 instances - 15 features - 0 classes - 0 missing values
No data.
44 runs0 likes3 downloads3 reach2 impact
1000000 instances - 15 features - 2 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
778 runs0 likes8 downloads8 reach8 impact
4562 instances - 15 features - 2 classes - 88 missing values
* Title of Database: Spoken Arabic Digit * Abstract: This dataset contains time series of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 males…
1 runs0 likes6 downloads6 reach6 impact
263256 instances - 15 features - 10 classes - 0 missing values
No data.
288 runs0 likes2 downloads2 reach2 impact
1000000 instances - 15 features - 9 classes - 0 missing values
No data.
51 runs0 likes2 downloads2 reach1 impact
1000000 instances - 15 features - 2 classes - 0 missing values
wind daily average wind speeds for 1961-1978 at 12 synoptic meteorological stations in the Republic of Ireland (Haslett and raftery 1989). These data were analyzed in detail in the following article:…
0 runs0 likes5 downloads5 reach5 impact
6574 instances - 15 features - 0 classes - 0 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 1. Title: Assessing the Reliability of a Human Estimator…
0 runs0 likes0 downloads0 reach5 impact
75 instances - 15 features - 0 classes - 0 missing values
hmeq_p,BAD,binary
0 runs0 likes0 downloads0 reach0 impact
5960 instances - 15 features - classes - 5271 missing values
This dataset was retrieved 2014-11-14 from the UCI site and converted to the ARFF format. __Major changes w.r.t. version 3: dataset from UCI that matches description and data types__ ### Feature…
4196 runs0 likes4 downloads4 reach5 impact
690 instances - 15 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
13600 runs1 likes17 downloads18 reach26 impact
48842 instances - 15 features - 2 classes - 6465 missing values
Zurich public transport delay data 2016-10-30 03:30:00 CET - 2016-11-27 01:20:00 CET cleaned and prepared at Open Data Day 2017.
0 runs0 likes2 downloads2 reach4 impact
5465575 instances - 15 features - 0 classes - 132617 missing values
In the early 2000s, Billy Beane and Paul DePodesta worked for the Oakland Athletics. While there, they literally changed the game of baseball. They didn't do it using a bat or glove, and they…
0 runs0 likes7 downloads7 reach3 impact
1232 instances - 15 features - 0 classes - 3600 missing values
All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement…
165221 runs2 likes90 downloads92 reach18 impact
14980 instances - 15 features - 2 classes - 0 missing values
Schizophrenic Eye-Tracking Data in Rubin and Wu (1997) Biometrics. Yingnian Wu (wu@hustat.harvard.edu) [14/Oct/97] Information about the dataset CLASSTYPE: nominal CLASSINDEX: last
748 runs0 likes7 downloads7 reach14 impact
340 instances - 15 features - 2 classes - 834 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes12 downloads12 reach7 impact
6574 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
769 runs0 likes12 downloads12 reach7 impact
252 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
857 runs0 likes12 downloads12 reach9 impact
9961 instances - 15 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
2671 runs1 likes30 downloads31 reach2 impact
48842 instances - 15 features - 2 classes - 6465 missing values
This dataset records 640 time series of 12 LPC cepstrum coefficients taken from nine male speakers. The data was collected for examining our newly developed classifier for multidimensional curves…
23156 runs0 likes11 downloads11 reach46 impact
9961 instances - 15 features - 9 classes - 0 missing values
Dataset sales
0 runs0 likes0 downloads0 reach0 impact
10738 instances - 15 features - 0 classes - 0 missing values
Data on the homicide rate in Detroit for the years 1961-1973. This is the data set called DETROIT in the book 'Subset selection in regression' by Alan J. Miller published in the Chapman & Hall series…
0 runs0 likes0 downloads0 reach5 impact
13 instances - 14 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: scaled to [-1,1]
0 runs0 likes5 downloads5 reach5 impact
270 instances - 14 features - 0 classes - 0 missing values
No data.
312 runs0 likes4 downloads4 reach3 impact
1000000 instances - 14 features - 3 classes - 0 missing values
No data.
0 runs0 likes1 downloads1 reach1 impact
1000000 instances - 14 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
131 runs1 likes9 downloads10 reach7 impact
990 instances - 14 features - 2 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes1 downloads1 reach5 impact
47 instances - 14 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 14 features - 0 classes - 0 missing values
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch,…
6 runs0 likes5 downloads5 reach9 impact
506 instances - 14 features - 0 classes - 0 missing values
Determinants of Plasma Retinol and Beta-Carotene Levels Summary: Observational studies have suggested that low dietary intake or low plasma concentrations of retinol, beta-carotene, or other…
15 runs0 likes0 downloads0 reach5 impact
315 instances - 14 features - 0 classes - 0 missing values
Donor: David W. Aha (aha@ics.uci.edu) This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one…
37 runs0 likes5 downloads5 reach1 impact
303 instances - 14 features - 0 classes - 6 missing values