OpenML
Filter results by:
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Regenerate features by the authors' matlab scripts (see Sec. C of Appendix A), then randomly select 10% instances from the…
0 runs0 likes2 downloads2 reach16 impact
98528 instances - 101 features - 0 classes - 0 missing values
Normalized version of vehicle dataset (http://www.openml.org/d/54) NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted…
372 runs0 likes10 downloads10 reach11 impact
98528 instances - 101 features - 2 classes - 0 missing values
* Dataset: Hill valley dataset. A noiseless version of the data set.
117 runs0 likes8 downloads8 reach15 impact
1212 instances - 101 features - 2 classes - 0 missing values
AutoML challenge 2014. Original task: regression. Test and validation sets can be obtained on the Cha Learn website: https://automl.chalearn.org/data
0 runs0 likes0 downloads0 reach4 impact
400000 instances - 101 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
100 instances - 101 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach13 impact
1000 instances - 101 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 101 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
250 instances - 101 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
600 runs0 likes12 downloads12 reach15 impact
1000 instances - 101 features - 2 classes - 0 missing values
Vehicle classification in distributed sensor networks. Journal of Parallel and Distributed Computing, 64(7):826-838, July 2004. This is the SensIT Vehicle (combined) dataset, retrieved 2013-11-14 from…
403 runs0 likes22 downloads22 reach16 impact
98528 instances - 101 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
734 runs0 likes7 downloads7 reach14 impact
100 instances - 101 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
763 runs0 likes8 downloads8 reach15 impact
250 instances - 101 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
739 runs0 likes7 downloads7 reach15 impact
500 instances - 101 features - 2 classes - 0 missing values
This data was collected from combine primary and secondary sources, through questionnaire, verbal interview and some part of the hospital’s record department’s data, from the selected…
0 runs0 likes0 downloads0 reach9 impact
281 instances - 98 features - 2 classes - 2 missing values
Source: Creators : François Kawala (1,2) Ahlame Douzal (1) Eric Gaussier (1) Eustache Diemert (2) Institutions : (1) Université Joseph Fourier (Grenoble I) Laboratoire d'informatique de…
0 runs0 likes1 downloads1 reach11 impact
28179 instances - 97 features - classes - 0 missing values
1. Title: Class-level data for KC1 This one includes a numeric attribute (NUMDEFECTS) to indicate defectiveness. 2. Sources (a) Creator: A. Gunes Koru (b) Date: February 21, 2005 (c) Contact: gkoru AT…
0 runs0 likes1 downloads1 reach13 impact
145 instances - 95 features - 0 classes - 0 missing values
1. Title: Class-level data for KC1 This one includes a {_TRUE,FALSE} attribute (DL) to indicate defectiveness. 2. Sources (a) Creator: A. Gunes Koru (b) Date: February 21, 2005 (c) Contact: gkoru AT…
765 runs0 likes7 downloads7 reach14 impact
145 instances - 95 features - 2 classes - 0 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable,…
747 runs0 likes7 downloads7 reach14 impact
145 instances - 95 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach8 impact
156 instances - 91 features - 2 classes - 0 missing values
* Dataset Title: Robot Execution Failures Data Set * Abstract: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque…
71 runs0 likes2 downloads2 reach13 impact
88 instances - 91 features - 4 classes - 0 missing values
* Dataset Title: Robot Execution Failures Data Set * Abstract: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque…
71 runs0 likes2 downloads2 reach13 impact
47 instances - 91 features - 5 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach12 impact
1000000 instances - 91 features - 0 classes - 0 missing values
* Dataset Title: Robot Execution Failures Data Set * Abstract: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque…
130 runs0 likes6 downloads6 reach13 impact
164 instances - 91 features - 5 classes - 0 missing values
* Dataset Title: Robot Execution Failures Data Set * Abstract: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque…
129 runs0 likes4 downloads4 reach13 impact
117 instances - 91 features - 3 classes - 0 missing values
* Dataset Title: Robot Execution Failures Data Set * Abstract: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque…
71 runs0 likes1 downloads1 reach13 impact
47 instances - 91 features - 4 classes - 0 missing values
University of Sao Paulo, School of Art, Sciences and Humanities, Sao Paulo, SP, Brazil ### LIBRAS Movement Database LIBRAS, acronym of the Portuguese name "LIngua BRAsileira de Sinais", is the…
0 runs0 likes4 downloads4 reach19 impact
360 instances - 91 features - 0 classes - 0 missing values
Information about customers consists of 86 variables and includes product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company…
0 runs0 likes3 downloads3 reach13 impact
9822 instances - 86 features - 0 classes - 0 missing values
Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning. The…
9537 runs0 likes0 downloads0 reach20 impact
1080 instances - 82 features - 8 classes - 1396 missing values
Two colour spotted cDNA array data set of a series of experiments to identify which genes in Yeast are cell cycle regulated.
0 runs0 likes0 downloads0 reach9 impact
6178 instances - 82 features - classes - 59017 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach8 impact
156 instances - 81 features - 2 classes - 0 missing values
This is a meta-dataset which describes the SVM hyperparameter tuning problem. The target attribute indicates whether tuning is required or default hyperparameter values are enough to each dataset…
0 runs0 likes0 downloads0 reach8 impact
156 instances - 81 features - 2 classes - 0 missing values
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's…
0 runs0 likes4 downloads4 reach8 impact
1460 instances - 81 features - 0 classes - 6965 missing values
URL dataset 3
0 runs0 likes0 downloads0 reach2 impact
18982 instances - 80 features - 5 classes - 0 missing values
Rows with NaN and inf values removed. Data converted from CSV to ARFF.
0 runs0 likes0 downloads0 reach4 impact
18982 instances - 80 features - classes - 0 missing values
Rows with NaN and inf values removed. Converted file format from CSV to ARFF.
0 runs0 likes1 downloads1 reach4 impact
18982 instances - 80 features - 5 classes - 0 missing values
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's…
0 runs0 likes2 downloads2 reach8 impact
1460 instances - 80 features - 0 classes - 6965 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach16 impact
425240 instances - 79 features - 2 classes - 2734000 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs0 likes1 downloads1 reach10 impact
593 instances - 78 features - classes - 0 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs2 likes5 downloads7 reach11 impact
593 instances - 78 features - 2 classes - 0 missing values
Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…
0 runs0 likes0 downloads0 reach9 impact
593 instances - 78 features - classes - 0 missing values
Abstract: This data-set contains examples of buzz events from two different social networks: Twitter, and Tom's Hardware, a forum network focusing on new technology with more conservative dynamics.…
0 runs0 likes0 downloads0 reach13 impact
583250 instances - 78 features - 0 classes - 0 missing values
No data.
290 runs0 likes5 downloads5 reach11 impact
1000000 instances - 77 features - 10 classes - 0 missing values
No data.
0 runs0 likes1 downloads1 reach9 impact
144 instances - 77 features - 0 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Supply Chain Management datasets are derived from the Trading Agent Competition in Supply…
0 runs0 likes0 downloads0 reach9 impact
8966 instances - 77 features - classes - 0 missing values
No data.
48 runs1 likes4 downloads5 reach12 impact
1000000 instances - 77 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
38026 runs0 likes12 downloads12 reach12 impact
2000 instances - 77 features - 10 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
758 runs0 likes11 downloads11 reach15 impact
2000 instances - 77 features - 2 classes - 0 missing values
Public procurement data for the European Economic Area, Switzerland, and the Macedonia. 2015
0 runs0 likes1 downloads1 reach8 impact
565163 instances - 75 features - 0 classes - 15247061 missing values
No data.
405 runs0 likes7 downloads7 reach12 impact
45164 instances - 75 features - 11 classes - 0 missing values
Dataset created to study concept drift in stream mining. It is constructed by combining the Covertype, Poker-Hand, and Electricity datasets. More details can be found in: Albert Bifet, Geoff Holmes,…
332 runs0 likes27 downloads27 reach12 impact
1455525 instances - 73 features - 10 classes - 0 missing values
1. Title: Ozone Level Detection 2. Source: Kun Zhang zhang.kun05 '@' gmail.com Department of Computer Science, Xavier University of Lousiana Wei Fan wei.fan '@' gmail.com IBM T.J.Watson Research…
0 runs0 likes1 downloads1 reach13 impact
2536 instances - 73 features - 0 classes - 0 missing values
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008. 1 . Abstract: Two ground ozone level data sets are included in…
187955 runs1 likes19 downloads20 reach28 impact
2534 instances - 73 features - 2 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The river flow datasets concern the prediction of river network flows for 48 h in the future at…
0 runs0 likes0 downloads0 reach9 impact
9125 instances - 72 features - classes - 3264 missing values
### Internet Usage Data #### Data Type multivariate #### Abstract This data contains general demographic information on internet users in 1997. ### Data Characteristics This data comes from a survey…
0 runs1 likes6 downloads7 reach12 impact
10108 instances - 72 features - 46 classes - 2699 missing values
A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. Further details concerning the book, including information on…
28846 runs0 likes8 downloads8 reach34 impact
841 instances - 71 features - 4 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
801 runs0 likes8 downloads8 reach15 impact
841 instances - 71 features - 2 classes - 0 missing values
No data.
33 runs0 likes4 downloads4 reach12 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach12 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
30 runs0 likes2 downloads2 reach12 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
31 runs0 likes1 downloads1 reach12 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach12 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
37 runs0 likes2 downloads2 reach12 impact
1000000 instances - 70 features - 24 classes - 0 missing values
This database is a standardized version of the original audiology database (see audiology.* in this directory). The non-standard set of attributes have been converted to a standard set of attributes…
7303 runs0 likes12 downloads12 reach12 impact
226 instances - 70 features - 24 classes - 317 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
721 runs0 likes5 downloads5 reach15 impact
226 instances - 70 features - 2 classes - 317 missing values
Fixed dataset for autoHorse.csv I suggest...
0 runs0 likes0 downloads0 reach11 impact
201 instances - 69 features - 186 classes - 0 missing values
price col is int now. autoHorse dataset
15 runs0 likes0 downloads0 reach12 impact
201 instances - 69 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
622 runs0 likes6 downloads6 reach17 impact
10108 instances - 69 features - 2 classes - 2699 missing values
The experiments were carried out with a group of 30 volunteers within an age bracket of 19-48 years. They performed a protocol of activities composed of six basic activities: three static postures…
83 runs0 likes9 downloads9 reach12 impact
180 instances - 68 features - 6 classes - 0 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective accident.…
0 runs0 likes0 downloads0 reach0 impact
363243 instances - 67 features - 3 classes - 2181757 missing values
No data.
194 runs0 likes3 downloads3 reach11 impact
1000000 instances - 65 features - 10 classes - 0 missing values
No data.
52 runs0 likes2 downloads2 reach10 impact
1000000 instances - 65 features - 10 classes - 0 missing values
No data.
50 runs0 likes1 downloads1 reach11 impact
1000000 instances - 65 features - 10 classes - 0 missing values
Automated file upload of BNG(optdigits)
100 runs1 likes1 downloads2 reach11 impact
1000000 instances - 65 features - 10 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Shape). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143288 runs1 likes39 downloads40 reach416 impact
1600 instances - 65 features - 100 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Margin). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143050 runs1 likes17 downloads18 reach418 impact
1600 instances - 65 features - 100 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Texture). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143077 runs2 likes66 downloads68 reach418 impact
1599 instances - 65 features - 100 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
38639 runs0 likes20 downloads20 reach12 impact
2000 instances - 65 features - 10 classes - 0 missing values
1. Title of Database: Optical Recognition of Handwritten Digits 2. Source: E. Alpaydin, C. Kaynak Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr…
35799 runs3 likes22 downloads25 reach12 impact
5620 instances - 65 features - 10 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
765 runs0 likes12 downloads12 reach15 impact
5620 instances - 65 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
794 runs0 likes9 downloads9 reach15 impact
2000 instances - 65 features - 2 classes - 0 missing values
No data.
996 runs0 likes4 downloads4 reach12 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
948 runs0 likes5 downloads5 reach12 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
949 runs0 likes4 downloads4 reach12 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
882 runs0 likes6 downloads6 reach12 impact
71 instances - 63 features - 6 classes - 0 missing values
CD4 count prediction date
0 runs0 likes0 downloads0 reach10 impact
16484 instances - 62 features - classes - 0 missing values
This work was partially supported by national funds through FCT and IST through the UID/EEA/50009/2013 project", "BL89/2017-IST-ID grant. In this dataset, we present usability (SUS), workload…
0 runs0 likes0 downloads0 reach7 impact
31 instances - 62 features - classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: scaled to [-1,1]
0 runs0 likes0 downloads0 reach16 impact
3175 instances - 61 features - 0 classes - 0 missing values
This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). *…
0 runs0 likes5 downloads5 reach11 impact
39644 instances - 61 features - 0 classes - 0 missing values
Test dataset
0 runs0 likes1 downloads1 reach13 impact
15547 instances - 61 features - 0 classes - 280 missing values
Test dataset
0 runs0 likes1 downloads1 reach13 impact
15547 instances - 61 features - 0 classes - 280 missing values
Test dataset
0 runs0 likes0 downloads0 reach13 impact
15547 instances - 61 features - 0 classes - 280 missing values
Test dataset
3 runs0 likes0 downloads0 reach15 impact
15547 instances - 61 features - 2 classes - 280 missing values
No data.
296 runs0 likes7 downloads7 reach9 impact
1000000 instances - 61 features - 2 classes - 0 missing values
No data.
50 runs0 likes3 downloads3 reach9 impact
1000000 instances - 61 features - 2 classes - 0 missing values
The problem is to learn a regression equation/rule/tree to predict the activity from the descriptive structural attributes. The data and methodology is described in detail in: - King, Ross .D., Hurst,…
5 runs0 likes1 downloads1 reach9 impact
186 instances - 61 features - 0 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach16 impact
416188 instances - 61 features - 355 classes - 0 missing values
Primate splice-junction gene sequences (DNA) with associated imperfect domain theory. Splice junctions are points on a DNA sequence at which 'superfluous' DNA is removed during the process of protein…
24188 runs1 likes17 downloads18 reach9 impact
3190 instances - 61 features - 3 classes - 0 missing values
NAME: Sonar, Mines vs. Rocks SUMMARY: This is the data set used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network [1]. The task is to train a network…
2366 runs1 likes25 downloads26 reach9 impact
208 instances - 61 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
744 runs0 likes8 downloads8 reach16 impact
7019 instances - 61 features - 2 classes - 43814 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
806 runs0 likes8 downloads8 reach15 impact
186 instances - 61 features - 2 classes - 0 missing values