Data
Filter results by:
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach5 impact
31406 instances - 23 features - 2 classes - 29756 missing values
source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) translator…
13 runs0 likes0 downloads0 reach0 impact
527 instances - 23 features - 4 classes - 0 missing values
Pasture Production Data source: Dave Barker AgResearch Grasslands, Palmerston North, New Zealand The objective was to predict pasture production from a variety of biophysical factors. Vegetation and…
878 runs0 likes6 downloads6 reach7 impact
36 instances - 23 features - 3 classes - 0 missing values
SPECT heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Sources: --…
1296 runs1 likes12 downloads13 reach8 impact
267 instances - 23 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
698 runs0 likes5 downloads5 reach6 impact
36 instances - 23 features - 2 classes - 0 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) In this version (version 2), some features were removed. It is unclear why of how this was done.
1883 runs0 likes9 downloads9 reach1 impact
368 instances - 23 features - 2 classes - 1927 missing values
### Description This dataset describes mushrooms in terms of their physical characteristics. They are classified into: poisonous or edible. ### Source ``` (a) Origin: Mushroom records are drawn from…
16392 runs1 likes40 downloads41 reach3 impact
8124 instances - 23 features - 2 classes - 2480 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
730 runs0 likes5 downloads5 reach6 impact
93 instances - 23 features - 2 classes - 14 missing values
1. Title: meta-data 2. Sources: (a) Creator: LIACC - University of Porto R.Campo Alegre 823 4150 PORTO (b) Donor: P.B.Brazdil or J.Gama Tel.: +351 600 1672 LIACC, University of Porto Fax.: +351 600…
32 runs0 likes2 downloads2 reach7 impact
528 instances - 22 features - 0 classes - 504 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 22 features - 0 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
0 runs0 likes6 downloads6 reach5 impact
8192 instances - 22 features - 0 classes - 0 missing values
Abstract: CART book's waveform domains Source: Original Owners: Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group:…
0 runs1 likes3 downloads4 reach3 impact
5000 instances - 22 features - classes - 0 missing values
Source: The dataset was created by Athanasios Tsanas (tsanasthanasis '@' gmail.com) and Max Little (littlem '@' physics.ox.ac.uk) of the University of Oxford, in collaboration with 10 medical centers…
0 runs1 likes2 downloads3 reach3 impact
5875 instances - 22 features - classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
2 runs1 likes1 downloads2 reach1 impact
8192 instances - 22 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach5 impact
13 instances - 22 features - 0 classes - 0 missing values
This directory contains Thyroid datasets. "ann-train.data" contains 3772 learning examples and "ann-test.data" contains 3428 testing examples. I have obtained this data from…
31 runs0 likes2 downloads2 reach6 impact
3772 instances - 22 features - 3 classes - 0 missing values
car-evaluation-pmlb
31 runs0 likes1 downloads1 reach11 impact
1728 instances - 22 features - 4 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
858 runs1 likes1 downloads2 reach6 impact
96320 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
159209 runs2 likes22 downloads24 reach19 impact
2109 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
147871 runs0 likes24 downloads24 reach19 impact
1109 instances - 22 features - 2 classes - 0 missing values
This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering. If you publish material based…
19125 runs0 likes18 downloads18 reach19 impact
10885 instances - 22 features - 2 classes - 25 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for science data processing. Data comes from McCabe and Halstead features extractors of source code. These features were…
174308 runs0 likes21 downloads21 reach18 impact
522 instances - 22 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes10 downloads10 reach7 impact
8192 instances - 22 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
691 runs0 likes6 downloads6 reach7 impact
528 instances - 22 features - 2 classes - 504 missing values
Lucas, D. D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., and Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models, Geosci. Model Dev. Discuss.,…
162436 runs0 likes19 downloads19 reach17 impact
540 instances - 21 features - 2 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au1-1000 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of…
3255 runs0 likes8 downloads8 reach15 impact
1000 instances - 21 features - 2 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes1 downloads1 reach5 impact
200 instances - 21 features - 0 classes - 0 missing values
No data.
2 runs0 likes0 downloads0 reach5 impact
506 instances - 21 features - 0 classes - 0 missing values
No data.
68 runs0 likes4 downloads4 reach2 impact
1000000 instances - 21 features - 2 classes - 0 missing values
No data.
225 runs0 likes7 downloads7 reach2 impact
1000000 instances - 21 features - 2 classes - 0 missing values
Automated file upload of BNG(credit-g)
99 runs0 likes3 downloads3 reach2 impact
1000000 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes0 downloads0 reach12 impact
1600 instances - 21 features - 2 classes - 0 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
6658 runs1 likes5 downloads6 reach15 impact
5000 instances - 21 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach7 impact
5124 instances - 21 features - 2 classes - 0 missing values
__Major changes w.r.t. version 1: deactivated first two variables as they describe the batch of the experiments and should not be used for prediction. Also transformed the target from numeric to…
6501 runs0 likes3 downloads3 reach5 impact
540 instances - 21 features - 2 classes - 0 missing values
microaggregation2_nominal
0 runs0 likes0 downloads0 reach3 impact
20000 instances - 21 features - 5 classes - 0 missing values
General Description 2015-current: greater than $200.00. The Commission categorizes contributions from individuals using the calendar year-to-date amount for political action committee (PAC) and party…
0 runs0 likes0 downloads0 reach0 impact
3348209 instances - 21 features - 0 classes - 10786577 missing values
1: Abstract: This is a 20 dimensional, 2 class classification problem. Each class is drawn from a multivariate normal distribution. Class 1 has mean zero and covariance 4 times the identity. Class 2…
120 runs0 likes8 downloads8 reach6 impact
7400 instances - 21 features - 2 classes - 0 missing values
* Twonorm dataset This is an implementation of Leo Breiman's twonorm example[1]. It is a 20 dimensional, 2 class classification example. Each class is drawn from a multivariate normal distribution…
118 runs0 likes5 downloads5 reach6 impact
7400 instances - 21 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
720 runs0 likes8 downloads8 reach7 impact
506 instances - 21 features - 2 classes - 0 missing values
This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ``` It is worse…
504982 runs15 likes171 downloads186 reach9 impact
1000 instances - 21 features - 2 classes - 0 missing values
No data.
117 runs0 likes4 downloads4 reach1 impact
1000000 instances - 20 features - 5 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach5 impact
120 instances - 20 features - 0 classes - 0 missing values
------------------------------------------------------------------------ Primary Biliary Cirrhosis The data set found in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis,…
18 runs0 likes2 downloads2 reach5 impact
418 instances - 20 features - 0 classes - 1033 missing values
No data.
211 runs0 likes3 downloads3 reach2 impact
1000000 instances - 20 features - 7 classes - 0 missing values
No data.
331 runs0 likes7 downloads7 reach1 impact
1000000 instances - 20 features - 2 classes - 0 missing values
Automated file upload of BNG(segment)
99 runs0 likes1 downloads1 reach2 impact
1000000 instances - 20 features - 7 classes - 0 missing values
The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. __Major changes w.r.t.…
7671 runs0 likes2 downloads2 reach13 impact
2310 instances - 20 features - 7 classes - 0 missing values
No data.
69 runs0 likes4 downloads4 reach1 impact
1000000 instances - 20 features - 2 classes - 0 missing values
Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets] Content Each row represents a…
0 runs0 likes0 downloads0 reach0 impact
7043 instances - 20 features - 2 classes - 0 missing values
This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It contains 19 house features plus the price and the id columns,…
0 runs0 likes0 downloads0 reach0 impact
21613 instances - 20 features - 0 classes - 0 missing values
This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It contains 19 house features plus the price and the id columns,…
0 runs0 likes0 downloads0 reach0 impact
21613 instances - 20 features - classes - 0 missing values
#modelage
0 runs0 likes0 downloads0 reach0 impact
224 instances - 20 features - 6 classes - 205 missing values
#modelage
0 runs0 likes0 downloads0 reach0 impact
202 instances - 20 features - 2 classes - 17 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
26402 runs0 likes10 downloads10 reach1 impact
736 instances - 20 features - 5 classes - 448 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
772 runs0 likes14 downloads14 reach7 impact
2310 instances - 20 features - 2 classes - 0 missing values
1. Title: Hepatitis Domain 2. Sources: (a) unknown (b) Donor: G.Gong (Carnegie-Mellon University) via Bojan Cestnik Jozef Stefan Institute Jamova 39 61000 Ljubljana Yugoslavia (tel.: (38)(+61) 214-399…
2134 runs1 likes12 downloads13 reach1 impact
155 instances - 20 features - 2 classes - 167 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
701 runs0 likes3 downloads3 reach7 impact
736 instances - 20 features - 2 classes - 448 missing values
The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. ### Attribute…
23138 runs0 likes22 downloads22 reach2 impact
2310 instances - 20 features - 7 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
27 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
27 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
27 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 19 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 19 features - 0 classes - 0 missing values
Primary Biliary Cirrhosis This data set is a follow-up to the original PBC data set, as discussed in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991. An…
0 runs0 likes4 downloads4 reach5 impact
1945 instances - 19 features - 0 classes - 1133 missing values
No data.
63 runs0 likes4 downloads4 reach2 impact
1000000 instances - 19 features - 4 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Case number deleted. X treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric…
10 runs0 likes1 downloads1 reach1 impact
418 instances - 19 features - 0 classes - 1239 missing values
This data set is also obtained from the task of controlling a F16 aircraft, although the target variable and attributes are different from the ailerons domain. In this case the goal variable is…
2 runs0 likes6 downloads6 reach1 impact
16599 instances - 19 features - 0 classes - 0 missing values
No data.
68 runs0 likes2 downloads2 reach1 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
310 runs0 likes4 downloads4 reach2 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
304 runs0 likes3 downloads3 reach1 impact
1000000 instances - 19 features - 4 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
748 runs0 likes8 downloads8 reach6 impact
148 instances - 19 features - 2 classes - 0 missing values
Citation Request: This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1972 runs0 likes30 downloads30 reach2 impact
148 instances - 19 features - 4 classes - 0 missing values
NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many…
28647 runs2 likes26 downloads28 reach2 impact
846 instances - 19 features - 4 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
810 runs0 likes7 downloads7 reach7 impact
846 instances - 19 features - 2 classes - 0 missing values
Dataset from `Pattern Recognition and Neural Networks' by B.D. Ripley. Cambridge University Press (1996) ISBN 0-521-46086-7 The background to the datasets is described in section 1.4; this file…
587 runs0 likes5 downloads5 reach6 impact
61 instances - 19 features - 4 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
616 runs0 likes11 downloads11 reach7 impact
16599 instances - 19 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
723 runs0 likes5 downloads5 reach7 impact
418 instances - 19 features - 2 classes - 1239 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
680 runs0 likes5 downloads5 reach7 impact
1945 instances - 19 features - 2 classes - 1133 missing values
No data.
50 runs0 likes1 downloads1 reach3 impact
1000000 instances - 18 features - 22 classes - 0 missing values
No data.
65 runs1 likes2 downloads3 reach1 impact
1000000 instances - 18 features - 7 classes - 0 missing values
No data.
291 runs0 likes4 downloads4 reach1 impact
1000000 instances - 18 features - 7 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Electrical Discharge Machining dataset (Karalic and Bratko 1997) represents a two-target…
0 runs0 likes0 downloads0 reach1 impact
154 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Jura (Goovaerts 1997) dataset consists of measurements of concentrations of seven heavy…
0 runs0 likes0 downloads0 reach1 impact
359 instances - 18 features - classes - 0 missing values
Testing this plattform
0 runs0 likes0 downloads0 reach3 impact
36203 instances - 18 features - 0 classes - 8971 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Electrical Discharge Machining dataset (Karalic and Bratko 1997) represents a two-target…
0 runs0 likes0 downloads0 reach1 impact
154 instances - 18 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Jura (Goovaerts 1997) dataset consists of measurements of concentrations of seven heavy…
0 runs0 likes0 downloads0 reach1 impact
359 instances - 18 features - classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
752 runs0 likes7 downloads7 reach7 impact
339 instances - 18 features - 2 classes - 225 missing values
Citation Request: This primary tumor domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1261 runs0 likes14 downloads14 reach2 impact
339 instances - 18 features - 21 classes - 225 missing values
Database of baseball players and play statistics, including 'Games_played', 'At_bats', 'Runs', 'Hits', 'Doubles', 'Triples', 'Home_runs', 'RBIs', 'Walks', 'Strikeouts', 'Batting_average',…
795 runs0 likes10 downloads10 reach2 impact
1340 instances - 18 features - 3 classes - 20 missing values