Data
Filter results by:
This is a PROMISE data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering. If you publish material based…
19693 runs0 likes19 downloads19 reach21 impact
10885 instances - 22 features - 2 classes - 25 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
148464 runs0 likes25 downloads25 reach21 impact
1109 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
159838 runs2 likes24 downloads26 reach21 impact
2109 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for science data processing. Data comes from McCabe and Halstead features extractors of source code. These features were…
174854 runs0 likes22 downloads22 reach20 impact
522 instances - 22 features - 2 classes - 0 missing values
No data.
45 runs0 likes2 downloads2 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
313 runs0 likes3 downloads3 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
72 runs0 likes3 downloads3 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
68 runs0 likes4 downloads4 reach1 impact
1000000 instances - 23 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach5 impact
31406 instances - 23 features - 2 classes - 29756 missing values
No data.
326 runs1 likes5 downloads6 reach3 impact
1000000 instances - 23 features - 2 classes - 0 missing values
source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) translator…
14 runs0 likes0 downloads0 reach1 impact
527 instances - 23 features - 4 classes - 0 missing values
libSVM","AAD group IJCNN 2001 neural network competition. Slide presentation in IJCNN'01, Ford Research Laboratory, 2001. http://www.geocities.com/ijcnn/nnc_ijcnn01.pdf . #Dataset from the LIBSVM data…
0 runs0 likes8 downloads8 reach7 impact
191681 instances - 23 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Original data: someone from Germany working with the car industry.
0 runs0 likes1 downloads1 reach6 impact
1243 instances - 23 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Attributes 2,4, and 6 deleted. Midrange price treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M.…
0 runs0 likes0 downloads0 reach10 impact
93 instances - 23 features - 0 classes - 14 missing values
* Abstract: Oxford Parkinson's Disease Detection Dataset * Source: The dataset was created by Max Little of the University of Oxford, in collaboration with the National Centre for Voice and Speech,…
179 runs1 likes14 downloads15 reach8 impact
195 instances - 23 features - 2 classes - 0 missing values
Water stress dataset
0 runs0 likes0 downloads0 reach0 impact
1188 instances - 23 features - 0 classes - 0 missing values
### Description This dataset describes mushrooms in terms of their physical characteristics. They are classified into: poisonous or edible. ### Source ``` (a) Origin: Mushroom records are drawn from…
16392 runs1 likes41 downloads42 reach6 impact
8124 instances - 23 features - 2 classes - 2480 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) In this version (version 2), some features were removed. It is unclear why of how this was done.
1883 runs0 likes9 downloads9 reach3 impact
368 instances - 23 features - 2 classes - 1927 missing values
Pasture Production Data source: Dave Barker AgResearch Grasslands, Palmerston North, New Zealand The objective was to predict pasture production from a variety of biophysical factors. Vegetation and…
878 runs0 likes6 downloads6 reach9 impact
36 instances - 23 features - 3 classes - 0 missing values
SPECT heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Sources: --…
1296 runs1 likes12 downloads13 reach10 impact
267 instances - 23 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
730 runs0 likes5 downloads5 reach8 impact
93 instances - 23 features - 2 classes - 14 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
698 runs0 likes5 downloads5 reach8 impact
36 instances - 23 features - 2 classes - 0 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
3 runs0 likes3 downloads3 reach4 impact
442 instances - 24 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach5 impact
16 instances - 24 features - 0 classes - 0 missing values
Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of…
2046 runs0 likes0 downloads0 reach4 impact
1000 instances - 24 features - 30 classes - 0 missing values
source: An Algorithm Selection Benchmark for the Container Pre-Marshalling Problem (CPMP) authors: K. Tierney and Y. Malitsky (features) / K. Tierney and D. Pacino and S. Voss (algorithms) translator…
0 runs0 likes1 downloads1 reach1 impact
2108 instances - 24 features - 0 classes - 0 missing values
Michel Lang fRMA-normalized. Only "Kratz-genes"*. \* (see: A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international…
0 runs0 likes8 downloads8 reach6 impact
226 instances - 24 features - 2 classes - 0 missing values
%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage…
2 runs0 likes2 downloads2 reach6 impact
93 instances - 24 features - 0 classes - 0 missing values
Squash Harvest Unstored Data source: Winna Harvey Crop and Food Research, Christchurch, New Zealand The purpose of the research was to determine the changes taking place in squash fruit during the…
876 runs0 likes4 downloads4 reach9 impact
52 instances - 24 features - 3 classes - 39 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
141 runs0 likes7 downloads7 reach8 impact
500 instances - 24 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
687 runs0 likes5 downloads5 reach8 impact
52 instances - 24 features - 2 classes - 39 missing values
No data.
0 runs0 likes0 downloads0 reach5 impact
1000 instances - 25 features - 0 classes - 0 missing values
led24-pmlb
31 runs0 likes2 downloads2 reach14 impact
3200 instances - 25 features - 10 classes - 0 missing values
No data.
304 runs0 likes7 downloads7 reach4 impact
1000000 instances - 25 features - 10 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach9 impact
1000 instances - 25 features - 0 classes - 0 missing values
The data were collected as the SCITOS G5 robot navigates through the room following the wall in a clockwise direction, for 4 rounds, using 24 ultrasound sensors arranged circularly around its 'waist'.…
22807 runs0 likes20 downloads20 reach27 impact
5456 instances - 25 features - 4 classes - 0 missing values
Squash Harvest Stored Data source: Winna Harvey Crop and Food Research, Christchurch, New Zealand The purpose of the research was to determine the changes taking place in squash fruit during the…
867 runs0 likes4 downloads4 reach9 impact
52 instances - 25 features - 3 classes - 7 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
707 runs0 likes5 downloads5 reach8 impact
52 instances - 25 features - 2 classes - 7 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
1000 instances - 26 features - 0 classes - 0 missing values
No data.
32 runs0 likes1 downloads1 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
33 runs0 likes4 downloads4 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
29 runs0 likes6 downloads6 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
29 runs0 likes4 downloads4 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs1 likes4 downloads5 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs0 likes5 downloads5 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
28 runs0 likes3 downloads3 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs0 likes2 downloads2 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 26 features - 0 classes - 0 missing values
No data.
63 runs0 likes3 downloads3 reach1 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
65 runs0 likes8 downloads8 reach1 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs1 likes3 downloads4 reach2 impact
1000000 instances - 26 features - 7 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : This is a pre-processed version of the dataset used in Kaggles See Click Predict Fix competition…
0 runs0 likes0 downloads0 reach2 impact
1137 instances - 26 features - classes - 9255 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : This is a pre-processed version of the dataset used in Kaggles See Click Predict Fix competition…
0 runs0 likes0 downloads0 reach2 impact
1137 instances - 26 features - classes - 9255 missing values
This dataset is an artificial simulation of the Duffing system with random changes from the chaotic to the non-chaotic regime at different noise levels.
0 runs0 likes0 downloads0 reach1 impact
2493200 instances - 26 features - classes - 0 missing values
This dataset reflects incidents of crime in the City of Los Angeles dating back to 2010. This data is transcribed from original crime reports that are typed on paper and therefore there may be some…
0 runs0 likes0 downloads0 reach1 impact
1468825 instances - 26 features - 0 classes - 7881776 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
1 runs0 likes1 downloads1 reach6 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach6 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach6 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach6 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach6 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
250 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
21 runs0 likes0 downloads0 reach7 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach6 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach6 impact
250 instances - 26 features - 0 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses in use as…
3252 runs2 likes26 downloads28 reach4 impact
205 instances - 26 features - 6 classes - 59 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
770 runs0 likes8 downloads8 reach8 impact
100 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
746 runs0 likes6 downloads6 reach9 impact
250 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
789 runs0 likes7 downloads7 reach8 impact
100 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
812 runs0 likes7 downloads7 reach9 impact
250 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
617 runs0 likes11 downloads11 reach9 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
775 runs0 likes6 downloads6 reach9 impact
500 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
759 runs0 likes6 downloads6 reach9 impact
250 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
638 runs0 likes9 downloads9 reach9 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
707 runs0 likes6 downloads6 reach9 impact
205 instances - 26 features - 2 classes - 57 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
771 runs0 likes9 downloads9 reach9 impact
500 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
608 runs1 likes9 downloads10 reach9 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
806 runs0 likes6 downloads6 reach9 impact
250 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
608 runs0 likes9 downloads9 reach9 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
808 runs1 likes9 downloads10 reach8 impact
100 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
813 runs0 likes7 downloads7 reach9 impact
500 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
727 runs0 likes5 downloads5 reach9 impact
205 instances - 26 features - 2 classes - 59 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
604 runs0 likes9 downloads9 reach9 impact
1000 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
791 runs0 likes7 downloads7 reach9 impact
500 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
764 runs0 likes7 downloads7 reach9 impact
250 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
776 runs0 likes7 downloads7 reach8 impact
100 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
816 runs0 likes7 downloads7 reach9 impact
500 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
792 runs0 likes7 downloads7 reach8 impact
100 instances - 26 features - 2 classes - 0 missing values
This is a sesnor data for test it is not complete.
0 runs0 likes4 downloads4 reach3 impact
127591 instances - 27 features - classes - 0 missing values
General Description of Thyroid Disease Databases and Related Files This directory contains 6 databases, corresponding test set, and corresponding documentation. They were left at the University of…
32 runs0 likes7 downloads7 reach5 impact
2800 instances - 27 features - 5 classes - 0 missing values
General Description of Thyroid Disease Databases and Related Files This directory contains 6 databases, corresponding test set, and corresponding documentation. They were left at the University of…
31 runs1 likes8 downloads9 reach5 impact
2800 instances - 27 features - 5 classes - 0 missing values