OpenML
Filter results by:
Automated file upload of BNG(segment)
99 runs0 likes1 downloads1 reach0 impact
1000000 instances - 20 features - 7 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
701 runs0 likes3 downloads3 reach6 impact
736 instances - 20 features - 2 classes - 448 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
772 runs0 likes14 downloads14 reach6 impact
2310 instances - 20 features - 2 classes - 0 missing values
------------------------------------------------------------------------ Primary Biliary Cirrhosis The data set found in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis,…
18 runs0 likes2 downloads2 reach4 impact
418 instances - 20 features - 0 classes - 1033 missing values
No data.
117 runs0 likes4 downloads4 reach0 impact
1000000 instances - 20 features - 5 classes - 0 missing values
1. Title: Hepatitis Domain 2. Sources: (a) unknown (b) Donor: G.Gong (Carnegie-Mellon University) via Bojan Cestnik Jozef Stefan Institute Jamova 39 61000 Ljubljana Yugoslavia (tel.: (38)(+61) 214-399…
2131 runs0 likes11 downloads11 reach0 impact
155 instances - 20 features - 2 classes - 167 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
25250 runs0 likes10 downloads10 reach1 impact
736 instances - 20 features - 5 classes - 448 missing values
The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. __Major changes w.r.t.…
4250 runs0 likes2 downloads2 reach7 impact
2310 instances - 20 features - 7 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au1-1000 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of…
3255 runs0 likes8 downloads8 reach14 impact
1000 instances - 21 features - 2 classes - 0 missing values
* Twonorm dataset This is an implementation of Leo Breiman's twonorm example[1]. It is a 20 dimensional, 2 class classification example. Each class is drawn from a multivariate normal distribution…
118 runs0 likes5 downloads5 reach5 impact
7400 instances - 21 features - 2 classes - 0 missing values
1: Abstract: This is a 20 dimensional, 2 class classification problem. Each class is drawn from a multivariate normal distribution. Class 1 has mean zero and covariance 4 times the identity. Class 2…
120 runs0 likes8 downloads8 reach5 impact
7400 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes0 downloads0 reach9 impact
1600 instances - 21 features - 2 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes1 downloads1 reach4 impact
200 instances - 21 features - 0 classes - 0 missing values
No data.
2 runs0 likes0 downloads0 reach4 impact
506 instances - 21 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
720 runs0 likes8 downloads8 reach6 impact
506 instances - 21 features - 2 classes - 0 missing values
Automated file upload of BNG(credit-g)
99 runs0 likes3 downloads3 reach0 impact
1000000 instances - 21 features - 2 classes - 0 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
4854 runs1 likes3 downloads4 reach11 impact
5000 instances - 21 features - 2 classes - 0 missing values
__Major changes w.r.t. version 1: deactivated first two variables as they describe the batch of the experiments and should not be used for prediction. Also transformed the target from numeric to…
4299 runs0 likes3 downloads3 reach4 impact
540 instances - 21 features - 2 classes - 0 missing values
microaggregation2_nominal
0 runs0 likes0 downloads0 reach0 impact
20000 instances - 21 features - 5 classes - 0 missing values
Lucas, D. D., Klein, R., Tannahill, J., Ivanova, D., Brandon, S., Domyancic, D., and Zhang, Y.: Failure analysis of parameter-induced simulation crashes in climate models, Geosci. Model Dev. Discuss.,…
162116 runs0 likes19 downloads19 reach17 impact
540 instances - 21 features - 2 classes - 0 missing values
No data.
225 runs0 likes7 downloads7 reach2 impact
1000000 instances - 21 features - 2 classes - 0 missing values
No data.
68 runs0 likes4 downloads4 reach2 impact
1000000 instances - 21 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach1 impact
5124 instances - 21 features - 2 classes - 0 missing values
This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ``` It is worse…
502674 runs11 likes144 downloads155 reach8 impact
1000 instances - 21 features - 2 classes - 0 missing values
car-evaluation-pmlb
31 runs0 likes1 downloads1 reach8 impact
1728 instances - 22 features - 4 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
691 runs0 likes6 downloads6 reach6 impact
528 instances - 22 features - 2 classes - 504 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 22 features - 0 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
0 runs0 likes6 downloads6 reach4 impact
8192 instances - 22 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach4 impact
13 instances - 22 features - 0 classes - 0 missing values
1. Title: meta-data 2. Sources: (a) Creator: LIACC - University of Porto R.Campo Alegre 823 4150 PORTO (b) Donor: P.B.Brazdil or J.Gama Tel.: +351 600 1672 LIACC, University of Porto Fax.: +351 600…
32 runs0 likes2 downloads2 reach6 impact
528 instances - 22 features - 0 classes - 504 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes10 downloads10 reach6 impact
8192 instances - 22 features - 2 classes - 0 missing values
This directory contains Thyroid datasets. "ann-train.data" contains 3772 learning examples and "ann-test.data" contains 3428 testing examples. I have obtained this data from…
31 runs0 likes2 downloads2 reach4 impact
3772 instances - 22 features - 3 classes - 0 missing values
Abstract: CART book's waveform domains Source: Original Owners: Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group:…
0 runs1 likes3 downloads4 reach2 impact
5000 instances - 22 features - classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for science data processing. Data comes from McCabe and Halstead features extractors of source code. These features were…
170331 runs0 likes19 downloads19 reach17 impact
522 instances - 22 features - 2 classes - 0 missing values
This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering. If you publish material based…
17060 runs0 likes18 downloads18 reach17 impact
10885 instances - 22 features - 2 classes - 25 missing values
Source: The dataset was created by Athanasios Tsanas (tsanasthanasis '@' gmail.com) and Max Little (littlem '@' physics.ox.ac.uk) of the University of Oxford, in collaboration with 10 medical centers…
0 runs1 likes2 downloads3 reach2 impact
5875 instances - 22 features - classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
2 runs1 likes1 downloads2 reach0 impact
8192 instances - 22 features - 0 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
143523 runs0 likes24 downloads24 reach18 impact
1109 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
154300 runs2 likes22 downloads24 reach19 impact
2109 instances - 22 features - 2 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
284 runs1 likes1 downloads2 reach4 impact
96320 instances - 22 features - 2 classes - 0 missing values
No data.
326 runs1 likes4 downloads5 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) In this version (version 2), some features were removed. It is unclear why of how this was done.
1883 runs0 likes9 downloads9 reach0 impact
368 instances - 23 features - 2 classes - 1927 missing values
No data.
68 runs0 likes4 downloads4 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
SPECT heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Sources: --…
1296 runs1 likes12 downloads13 reach7 impact
267 instances - 23 features - 2 classes - 0 missing values
Pasture Production Data source: Dave Barker AgResearch Grasslands, Palmerston North, New Zealand The objective was to predict pasture production from a variety of biophysical factors. Vegetation and…
878 runs0 likes6 downloads6 reach6 impact
36 instances - 23 features - 3 classes - 0 missing values
No data.
45 runs0 likes2 downloads2 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
313 runs0 likes3 downloads3 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
libSVM","AAD group IJCNN 2001 neural network competition. Slide presentation in IJCNN'01, Ford Research Laboratory, 2001. http://www.geocities.com/ijcnn/nnc_ijcnn01.pdf . #Dataset from the LIBSVM data…
0 runs0 likes8 downloads8 reach5 impact
191681 instances - 23 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Original data: someone from Germany working with the car industry.
0 runs0 likes1 downloads1 reach4 impact
1243 instances - 23 features - 0 classes - 0 missing values
* Abstract: Oxford Parkinson's Disease Detection Dataset * Source: The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech,…
179 runs1 likes14 downloads15 reach5 impact
195 instances - 23 features - 2 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Attributes 2,4, and 6 deleted. Midrange price treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M.…
0 runs0 likes0 downloads0 reach6 impact
93 instances - 23 features - 0 classes - 14 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
730 runs0 likes5 downloads5 reach5 impact
93 instances - 23 features - 2 classes - 14 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
698 runs0 likes5 downloads5 reach5 impact
36 instances - 23 features - 2 classes - 0 missing values
No data.
72 runs0 likes3 downloads3 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach1 impact
31406 instances - 23 features - 2 classes - 29756 missing values
### Description This dataset describes mushrooms in terms of their physical characteristics. They are classified into: poisonous or edible. ### Source ``` (a) Origin: Mushroom records are drawn from…
16160 runs1 likes37 downloads38 reach1 impact
8124 instances - 23 features - 2 classes - 2480 missing values
Squash Harvest Unstored Data source: Winna Harvey Crop and Food Research, Christchurch, New Zealand The purpose of the research was to determine the changes taking place in squash fruit during the…
876 runs0 likes4 downloads4 reach6 impact
52 instances - 24 features - 3 classes - 39 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach4 impact
16 instances - 24 features - 0 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
2 runs0 likes2 downloads2 reach4 impact
93 instances - 24 features - 0 classes - 0 missing values
Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of…
2046 runs0 likes0 downloads0 reach2 impact
1000 instances - 24 features - 30 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
687 runs0 likes5 downloads5 reach5 impact
52 instances - 24 features - 2 classes - 39 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
3 runs0 likes3 downloads3 reach3 impact
442 instances - 24 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
141 runs0 likes7 downloads7 reach5 impact
500 instances - 24 features - 2 classes - 0 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
0 runs0 likes7 downloads7 reach4 impact
226 instances - 24 features - 2 classes - 0 missing values
No data.
304 runs0 likes6 downloads6 reach0 impact
1000000 instances - 25 features - 10 classes - 0 missing values
Squash Harvest Stored Data source: Winna Harvey Crop and Food Research, Christchurch, New Zealand The purpose of the research was to determine the changes taking place in squash fruit during the…
867 runs0 likes4 downloads4 reach6 impact
52 instances - 25 features - 3 classes - 7 missing values
No data.
0 runs0 likes0 downloads0 reach4 impact
1000 instances - 25 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach4 impact
1000 instances - 25 features - 0 classes - 0 missing values
The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around its 'waist'.…
16428 runs0 likes19 downloads19 reach24 impact
5456 instances - 25 features - 4 classes - 0 missing values
led24-pmlb
31 runs0 likes1 downloads1 reach8 impact
3200 instances - 25 features - 10 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
707 runs0 likes5 downloads5 reach5 impact
52 instances - 25 features - 2 classes - 7 missing values
No data.
63 runs0 likes3 downloads3 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses in use as…
2493 runs1 likes24 downloads25 reach0 impact
205 instances - 26 features - 6 classes - 59 missing values
No data.
65 runs0 likes8 downloads8 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 26 features - 0 classes - 0 missing values
No data.
33 runs0 likes4 downloads4 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
29 runs0 likes4 downloads4 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs1 likes4 downloads5 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs1 likes3 downloads4 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs0 likes5 downloads5 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
28 runs0 likes3 downloads3 reach0 impact
1000000 instances - 26 features - 7 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
1 runs0 likes1 downloads1 reach4 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach4 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach4 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach4 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach4 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach4 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach4 impact
250 instances - 26 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
770 runs0 likes8 downloads8 reach5 impact
100 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
764 runs0 likes7 downloads7 reach6 impact
250 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
746 runs0 likes6 downloads6 reach6 impact
250 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
812 runs0 likes7 downloads7 reach6 impact
250 instances - 26 features - 2 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach4 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach4 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach4 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach4 impact
250 instances - 26 features - 0 classes - 0 missing values