OpenML
Filter results by:
Abstract: The data set is composed of 60 chorales (5665 events) by J.S. Bach (1675-1750). Each event of each chorale is labelled using 1 among 101 chord labels and described through 14 features.…
31 runs0 likes2 downloads2 reach3 impact
5665 instances - 17 features - 102 classes - 0 missing values
#study_1
0 runs0 likes0 downloads0 reach1 impact
944 instances - 17 features - classes - 0 missing values
* Dataset: Reduced version (10 % of the examples) of bank-marketing dataset.
104 runs0 likes15 downloads15 reach5 impact
4521 instances - 17 features - 2 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 17 features - classes - 0 missing values
No data.
332 runs0 likes4 downloads4 reach0 impact
1000000 instances - 17 features - 2 classes - 0 missing values
This database was designed on the basis of data provided by US Census Bureau [http://www.census.gov] (under Lookup Access [http://www.census.gov/cdrom/lookup]: Summary Tape File 1). The data were…
0 runs1 likes6 downloads7 reach4 impact
22784 instances - 17 features - 0 classes - 0 missing values
Graeme D. Hutcheson and Nick Sofroniou 1999 The Multivariate Social Scientist: Introductory Statistics Using Generalized Linear Models. SAGE Publications. Copyright: Graeme D. Hutcheson & Nick…
0 runs0 likes0 downloads0 reach4 impact
42 instances - 17 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
100 runs0 likes3 downloads3 reach5 impact
31 instances - 17 features - 2 classes - 150 missing values
County data from the 2000 Presidential Election in Florida. Compiled by Brett Presnell Department of Statistics, University of Florida These data are derived from three sources, described below. As…
32 runs0 likes4 downloads4 reach5 impact
67 instances - 17 features - 5 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
604 runs0 likes13 downloads13 reach6 impact
22784 instances - 17 features - 2 classes - 0 missing values
The AAUP dataset for the ASA Statistical Graphics Section's 1995 Data Analysis Exposition contains information on faculty salaries for 1161 American colleges and universities. The data may be obtained…
32 runs0 likes3 downloads3 reach5 impact
1161 instances - 17 features - 4 classes - 256 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
639 runs0 likes12 downloads12 reach6 impact
20000 instances - 17 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
101 runs0 likes5 downloads5 reach6 impact
1161 instances - 17 features - 2 classes - 256 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
2 runs0 likes2 downloads2 reach4 impact
60 instances - 17 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
676 runs0 likes13 downloads13 reach6 impact
10992 instances - 17 features - 2 classes - 0 missing values
* Title: Thoracic Surgery Data Data Set * Abstract: The data is dedicated to classification problem related to the post-operative life expectancy in the lung cancer patients: class 1 - death within…
145 runs0 likes6 downloads6 reach5 impact
470 instances - 17 features - 2 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
2 runs0 likes0 downloads0 reach4 impact
59 instances - 17 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
112 runs0 likes5 downloads5 reach5 impact
42 instances - 17 features - 2 classes - 0 missing values
We create a digit database by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by…
26338 runs0 likes18 downloads18 reach0 impact
10992 instances - 17 features - 10 classes - 0 missing values
No data.
68 runs0 likes3 downloads3 reach0 impact
20000 instances - 17 features - 3 classes - 10000 missing values
Date: Tue, 15 Nov 88 15:44:08 EST From: stan To: aha@ICS.UCI.EDU 1. Title: Final settlements in labor negotitions in Canadian industry 2. Source Information -- Creators:…
7630 runs0 likes15 downloads15 reach0 impact
57 instances - 17 features - 2 classes - 326 missing values
The data was collected retrospectively at Wroclaw Thoracic Surgery Centre for patients who underwent major lung resections for primary lung cancer in the years 2007 - 2011. The Centre is associated…
31 runs0 likes3 downloads3 reach3 impact
470 instances - 17 features - 2 classes - 0 missing values
1. TITLE: Letter Image Recognition Data The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The…
64269 runs1 likes68 downloads69 reach0 impact
20000 instances - 17 features - 26 classes - 0 missing values
A simple database containing 17 Boolean-valued attributes describing animals. The "type" attribute appears to be the class attribute. Notes: * I find it unusual that there are 2 instances of "frog"…
168 runs2 likes14 downloads16 reach0 impact
101 instances - 18 features - 7 classes - 0 missing values
No data.
65 runs1 likes2 downloads3 reach0 impact
1000000 instances - 18 features - 7 classes - 0 missing values
Citation Request: This primary tumor domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1261 runs0 likes13 downloads13 reach0 impact
339 instances - 18 features - 21 classes - 225 missing values
No data.
291 runs0 likes4 downloads4 reach0 impact
1000000 instances - 18 features - 7 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
41 runs0 likes2 downloads2 reach6 impact
1340 instances - 18 features - 3 classes - 20 missing values
Database of baseball players and play statistics, including 'Games_played', 'At_bats', 'Runs', 'Hits', 'Doubles', 'Triples', 'Home_runs', 'RBIs', 'Walks', 'Strikeouts', 'Batting_average',…
789 runs0 likes9 downloads9 reach0 impact
1340 instances - 18 features - 3 classes - 20 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
131 runs0 likes6 downloads6 reach6 impact
1340 instances - 18 features - 2 classes - 20 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
176 runs0 likes6 downloads6 reach5 impact
101 instances - 18 features - 2 classes - 0 missing values
No data.
50 runs0 likes1 downloads1 reach0 impact
1000000 instances - 18 features - 22 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
752 runs0 likes7 downloads7 reach6 impact
339 instances - 18 features - 2 classes - 225 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Jura (Goovaerts 1997) dataset consists of measurements of concentrations of seven heavy…
0 runs0 likes0 downloads0 reach0 impact
359 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Electrical Discharge Machining dataset (Karalic and Bratko 1997) represents a two-target…
0 runs0 likes0 downloads0 reach0 impact
154 instances - 18 features - classes - 0 missing values
No data.
63 runs0 likes4 downloads4 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many…
20914 runs1 likes23 downloads24 reach0 impact
846 instances - 19 features - 4 classes - 0 missing values
No data.
68 runs0 likes2 downloads2 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Case number deleted. X treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric…
10 runs0 likes1 downloads1 reach0 impact
418 instances - 19 features - 0 classes - 1239 missing values
This data set is also obtained from the task of controlling a F16 aircraft, although the target variable and attributes are different from the ailerons domain. In this case the goal variable is…
2 runs0 likes5 downloads5 reach0 impact
16599 instances - 19 features - 0 classes - 0 missing values
No data.
304 runs0 likes3 downloads3 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
310 runs0 likes4 downloads4 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
Dataset from `Pattern Recognition and Neural Networks' by B.D. Ripley. Cambridge University Press (1996) ISBN 0-521-46086-7 The background to the datasets is described in section 1.4; this file…
587 runs0 likes5 downloads5 reach5 impact
61 instances - 19 features - 4 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 19 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 19 features - 0 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
27 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
27 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
27 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
Primary Biliary Cirrhosis This data set is a follow-up to the original PBC data set, as discussed in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991. An…
0 runs0 likes3 downloads3 reach4 impact
1945 instances - 19 features - 0 classes - 1133 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
723 runs0 likes4 downloads4 reach6 impact
418 instances - 19 features - 2 classes - 1239 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
680 runs0 likes5 downloads5 reach6 impact
1945 instances - 19 features - 2 classes - 1133 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
616 runs0 likes11 downloads11 reach6 impact
16599 instances - 19 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
748 runs0 likes8 downloads8 reach5 impact
148 instances - 19 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
810 runs0 likes7 downloads7 reach6 impact
846 instances - 19 features - 2 classes - 0 missing values
Citation Request: This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1963 runs0 likes30 downloads30 reach0 impact
148 instances - 19 features - 4 classes - 0 missing values
No data.
211 runs0 likes3 downloads3 reach0 impact
1000000 instances - 20 features - 7 classes - 0 missing values
1. Title: Hepatitis Domain 2. Sources: (a) unknown (b) Donor: G.Gong (Carnegie-Mellon University) via Bojan Cestnik Jozef Stefan Institute Jamova 39 61000 Ljubljana Yugoslavia (tel.: (38)(+61) 214-399…
2090 runs0 likes10 downloads10 reach0 impact
155 instances - 20 features - 2 classes - 167 missing values
The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. ### Attribute…
19234 runs0 likes22 downloads22 reach0 impact
2310 instances - 20 features - 7 classes - 0 missing values
No data.
69 runs0 likes4 downloads4 reach0 impact
1000000 instances - 20 features - 2 classes - 0 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
21607 runs0 likes9 downloads9 reach0 impact
736 instances - 20 features - 5 classes - 448 missing values
No data.
331 runs0 likes7 downloads7 reach0 impact
1000000 instances - 20 features - 2 classes - 0 missing values
The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. __Major changes w.r.t.…
4250 runs0 likes2 downloads2 reach7 impact
2310 instances - 20 features - 7 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach4 impact
120 instances - 20 features - 0 classes - 0 missing values
Automated file upload of BNG(segment)
99 runs0 likes1 downloads1 reach0 impact
1000000 instances - 20 features - 7 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
701 runs0 likes3 downloads3 reach6 impact
736 instances - 20 features - 2 classes - 448 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
772 runs0 likes14 downloads14 reach6 impact
2310 instances - 20 features - 2 classes - 0 missing values
------------------------------------------------------------------------ Primary Biliary Cirrhosis The data set found in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis,…
18 runs0 likes2 downloads2 reach4 impact
418 instances - 20 features - 0 classes - 1033 missing values
No data.
117 runs0 likes4 downloads4 reach0 impact
1000000 instances - 20 features - 5 classes - 0 missing values
No data.
68 runs0 likes3 downloads3 reach0 impact
1000000 instances - 21 features - 2 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au1-1000 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of…
3255 runs0 likes8 downloads8 reach14 impact
1000 instances - 21 features - 2 classes - 0 missing values
* Twonorm dataset This is an implementation of Leo Breiman's twonorm example[1]. It is a 20 dimensional, 2 class classification example. Each class is drawn from a multivariate normal distribution…
118 runs0 likes5 downloads5 reach5 impact
7400 instances - 21 features - 2 classes - 0 missing values
1: Abstract: This is a 20 dimensional, 2 class classification problem. Each class is drawn from a multivariate normal distribution. Class 1 has mean zero and covariance 4 times the identity. Class 2…
120 runs0 likes8 downloads8 reach5 impact
7400 instances - 21 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach1 impact
5124 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes1 downloads1 reach4 impact
200 instances - 21 features - 0 classes - 0 missing values
No data.
2 runs0 likes0 downloads0 reach4 impact
506 instances - 21 features - 0 classes - 0 missing values
No data.
225 runs0 likes6 downloads6 reach0 impact
1000000 instances - 21 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
720 runs0 likes8 downloads8 reach6 impact
506 instances - 21 features - 2 classes - 0 missing values
Automated file upload of BNG(credit-g)
99 runs0 likes3 downloads3 reach0 impact
1000000 instances - 21 features - 2 classes - 0 missing values
Lucas, D. D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., and Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models, Geosci. Model Dev. Discuss.,…
161313 runs0 likes18 downloads18 reach16 impact
540 instances - 21 features - 2 classes - 0 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
4854 runs1 likes3 downloads4 reach11 impact
5000 instances - 21 features - 2 classes - 0 missing values
This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ``` It is worse…
501830 runs9 likes127 downloads136 reach2 impact
1000 instances - 21 features - 2 classes - 0 missing values
__Major changes w.r.t. version 1: deactivated first two variables as they describe the batch of the experiments and should not be used for prediction. Also transformed the target from numeric to…
4299 runs0 likes3 downloads3 reach4 impact
540 instances - 21 features - 2 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
2 runs1 likes1 downloads2 reach0 impact
8192 instances - 22 features - 0 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
150513 runs1 likes20 downloads21 reach18 impact
2109 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
140568 runs0 likes23 downloads23 reach18 impact
1109 instances - 22 features - 2 classes - 0 missing values
car-evaluation-pmlb
31 runs0 likes1 downloads1 reach8 impact
1728 instances - 22 features - 4 classes - 0 missing values
Source: The dataset was created by Athanasios Tsanas (tsanasthanasis '@' gmail.com) and Max Little (littlem '@' physics.ox.ac.uk) of the University of Oxford, in collaboration with 10 medical centers…
0 runs1 likes1 downloads2 reach2 impact
5875 instances - 22 features - classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
691 runs0 likes6 downloads6 reach6 impact
528 instances - 22 features - 2 classes - 504 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 22 features - 0 classes - 0 missing values