OpenML
Filter results by:
* Twonorm dataset This is an implementation of Leo Breiman's twonorm example[1]. It is a 20 dimensional, 2 class classification example. Each class is drawn from a multivariate normal distribution…
118 runs0 likes5 downloads5 reach6 impact
7400 instances - 21 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
720 runs0 likes8 downloads8 reach7 impact
506 instances - 21 features - 2 classes - 0 missing values
This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ``` It is worse…
504982 runs15 likes171 downloads186 reach9 impact
1000 instances - 21 features - 2 classes - 0 missing values
1. Title: meta-data 2. Sources: (a) Creator: LIACC - University of Porto R.Campo Alegre 823 4150 PORTO (b) Donor: P.B.Brazdil or J.Gama Tel.: +351 600 1672 LIACC, University of Porto Fax.: +351 600…
32 runs0 likes2 downloads2 reach7 impact
528 instances - 22 features - 0 classes - 504 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 22 features - 0 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
0 runs0 likes6 downloads6 reach5 impact
8192 instances - 22 features - 0 classes - 0 missing values
Abstract: CART book's waveform domains Source: Original Owners: Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group:…
0 runs1 likes3 downloads4 reach3 impact
5000 instances - 22 features - classes - 0 missing values
Source: The dataset was created by Athanasios Tsanas (tsanasthanasis '@' gmail.com) and Max Little (littlem '@' physics.ox.ac.uk) of the University of Oxford, in collaboration with 10 medical centers…
0 runs1 likes2 downloads3 reach3 impact
5875 instances - 22 features - classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
2 runs1 likes1 downloads2 reach1 impact
8192 instances - 22 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach5 impact
13 instances - 22 features - 0 classes - 0 missing values
This directory contains Thyroid datasets. "ann-train.data" contains 3772 learning examples and "ann-test.data" contains 3428 testing examples. I have obtained this data from…
31 runs0 likes2 downloads2 reach6 impact
3772 instances - 22 features - 3 classes - 0 missing values
car-evaluation-pmlb
31 runs0 likes1 downloads1 reach11 impact
1728 instances - 22 features - 4 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
858 runs1 likes1 downloads2 reach6 impact
96320 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
159209 runs2 likes22 downloads24 reach19 impact
2109 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
147871 runs0 likes24 downloads24 reach19 impact
1109 instances - 22 features - 2 classes - 0 missing values
This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering. If you publish material based…
19125 runs0 likes18 downloads18 reach19 impact
10885 instances - 22 features - 2 classes - 25 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for science data processing. Data comes from McCabe and Halstead features extractors of source code. These features were…
174308 runs0 likes21 downloads21 reach18 impact
522 instances - 22 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
760 runs0 likes10 downloads10 reach7 impact
8192 instances - 22 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
691 runs0 likes6 downloads6 reach7 impact
528 instances - 22 features - 2 classes - 504 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Attributes 2,4, and 6 deleted. Midrange price treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M.…
0 runs0 likes0 downloads0 reach7 impact
93 instances - 23 features - 0 classes - 14 missing values
* Abstract: Oxford Parkinson's Disease Detection Dataset * Source: The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech,…
179 runs1 likes14 downloads15 reach6 impact
195 instances - 23 features - 2 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Original data: someone from Germany working with the car industry.
0 runs0 likes1 downloads1 reach5 impact
1243 instances - 23 features - 0 classes - 0 missing values
libSVM","AAD group IJCNN 2001 neural network competition. Slide presentation in IJCNN'01, Ford Research Laboratory, 2001. http://www.geocities.com/ijcnn/nnc_ijcnn01.pdf . #Dataset from the LIBSVM data…
0 runs0 likes8 downloads8 reach6 impact
191681 instances - 23 features - 0 classes - 0 missing values
No data.
45 runs0 likes2 downloads2 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
313 runs0 likes3 downloads3 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
72 runs0 likes3 downloads3 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
326 runs1 likes4 downloads5 reach2 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
68 runs0 likes4 downloads4 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach5 impact
31406 instances - 23 features - 2 classes - 29756 missing values
source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) translator…
13 runs0 likes0 downloads0 reach0 impact
527 instances - 23 features - 4 classes - 0 missing values
Pasture Production Data source: Dave Barker AgResearch Grasslands, Palmerston North, New Zealand The objective was to predict pasture production from a variety of biophysical factors. Vegetation and…
878 runs0 likes6 downloads6 reach7 impact
36 instances - 23 features - 3 classes - 0 missing values
SPECT heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Sources: --…
1296 runs1 likes12 downloads13 reach8 impact
267 instances - 23 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
698 runs0 likes5 downloads5 reach6 impact
36 instances - 23 features - 2 classes - 0 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) In this version (version 2), some features were removed. It is unclear why of how this was done.
1883 runs0 likes9 downloads9 reach1 impact
368 instances - 23 features - 2 classes - 1927 missing values
### Description This dataset describes mushrooms in terms of their physical characteristics. They are classified into: poisonous or edible. ### Source ``` (a) Origin: Mushroom records are drawn from…
16392 runs1 likes40 downloads41 reach3 impact
8124 instances - 23 features - 2 classes - 2480 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
730 runs0 likes5 downloads5 reach6 impact
93 instances - 23 features - 2 classes - 14 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
2 runs0 likes2 downloads2 reach5 impact
93 instances - 24 features - 0 classes - 0 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
3 runs0 likes3 downloads3 reach4 impact
442 instances - 24 features - 0 classes - 0 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
0 runs0 likes7 downloads7 reach5 impact
226 instances - 24 features - 2 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach5 impact
16 instances - 24 features - 0 classes - 0 missing values
Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of…
2046 runs0 likes0 downloads0 reach4 impact
1000 instances - 24 features - 30 classes - 0 missing values
source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) translator…
0 runs0 likes1 downloads1 reach0 impact
2108 instances - 24 features - 0 classes - 0 missing values
Squash Harvest Unstored Data source: Winna Harvey Crop and Food Research, Christchurch, New Zealand The purpose of the research was to determine the changes taking place in squash fruit during the…
876 runs0 likes4 downloads4 reach7 impact
52 instances - 24 features - 3 classes - 39 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
141 runs0 likes7 downloads7 reach6 impact
500 instances - 24 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
687 runs0 likes5 downloads5 reach6 impact
52 instances - 24 features - 2 classes - 39 missing values
No data.
0 runs0 likes0 downloads0 reach5 impact
1000 instances - 25 features - 0 classes - 0 missing values
The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around its 'waist'.…
22301 runs0 likes20 downloads20 reach26 impact
5456 instances - 25 features - 4 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach5 impact
1000 instances - 25 features - 0 classes - 0 missing values
No data.
304 runs0 likes6 downloads6 reach2 impact
1000000 instances - 25 features - 10 classes - 0 missing values
led24-pmlb
31 runs0 likes1 downloads1 reach12 impact
3200 instances - 25 features - 10 classes - 0 missing values
Squash Harvest Stored Data source: Winna Harvey Crop and Food Research, Christchurch, New Zealand The purpose of the research was to determine the changes taking place in squash fruit during the…
867 runs0 likes4 downloads4 reach7 impact
52 instances - 25 features - 3 classes - 7 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
707 runs0 likes5 downloads5 reach6 impact
52 instances - 25 features - 2 classes - 7 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
1 runs0 likes1 downloads1 reach5 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach5 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
21 runs0 likes0 downloads0 reach5 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
1000 instances - 26 features - 0 classes - 0 missing values
No data.
32 runs0 likes1 downloads1 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
33 runs0 likes4 downloads4 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
29 runs0 likes6 downloads6 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
29 runs0 likes4 downloads4 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs1 likes4 downloads5 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs0 likes5 downloads5 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
28 runs0 likes3 downloads3 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs0 likes2 downloads2 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
250 instances - 26 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 26 features - 0 classes - 0 missing values
No data.
63 runs0 likes3 downloads3 reach1 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
65 runs0 likes8 downloads8 reach1 impact
1000000 instances - 26 features - 7 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : This is a pre-processed version of the dataset used in Kaggles See Click Predict Fix competition…
0 runs0 likes0 downloads0 reach1 impact
1137 instances - 26 features - classes - 9255 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : This is a pre-processed version of the dataset used in Kaggles See Click Predict Fix competition…
0 runs0 likes0 downloads0 reach1 impact
1137 instances - 26 features - classes - 9255 missing values
This dataset reflects incidents of crime in the City of Los Angeles dating back to 2010. This data is transcribed from original crime reports that are typed on paper and therefore there may be some…
0 runs0 likes0 downloads0 reach0 impact
1468825 instances - 26 features - 0 classes - 7881776 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
638 runs0 likes9 downloads9 reach7 impact
1000 instances - 26 features - 2 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses in use as…
3252 runs2 likes25 downloads27 reach1 impact
205 instances - 26 features - 6 classes - 59 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
775 runs0 likes6 downloads6 reach7 impact
500 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
617 runs0 likes11 downloads11 reach7 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
806 runs0 likes6 downloads6 reach7 impact
250 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
727 runs0 likes5 downloads5 reach7 impact
205 instances - 26 features - 2 classes - 59 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
764 runs0 likes7 downloads7 reach7 impact
250 instances - 26 features - 2 classes - 0 missing values
No data.
27 runs1 likes3 downloads4 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
604 runs0 likes9 downloads9 reach7 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
771 runs0 likes8 downloads8 reach7 impact
500 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
707 runs0 likes6 downloads6 reach7 impact
205 instances - 26 features - 2 classes - 57 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
608 runs1 likes9 downloads10 reach7 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
608 runs0 likes9 downloads9 reach7 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
770 runs0 likes8 downloads8 reach6 impact
100 instances - 26 features - 2 classes - 0 missing values