Data
Filter results by:
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
7 runs0 likes1 downloads1 reach7 impact
58310 instances - 181 features - 10 classes - 0 missing values
Automated file upload of BNG(optdigits)
100 runs1 likes1 downloads2 reach2 impact
1000000 instances - 65 features - 10 classes - 0 missing values
led7-pmlb
31 runs0 likes0 downloads0 reach12 impact
3200 instances - 8 features - 10 classes - 0 missing values
This is a 20,000 instance sample of the original CIFAR-10 dataset. Sampled randomly and stratified, with 2000 examples per class. Training and test set are merged. Find the corresponding task for the…
380 runs0 likes4 downloads4 reach12 impact
20000 instances - 3073 features - 10 classes - 0 missing values
The dataset and this description is made available on http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html. Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal…
56 runs0 likes1 downloads1 reach3 impact
9298 instances - 257 features - 10 classes - 0 missing values
CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image…
40 runs0 likes0 downloads0 reach5 impact
13000 instances - 27649 features - 10 classes - 0 missing values
SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor…
52 runs0 likes1 downloads1 reach6 impact
99289 instances - 3073 features - 10 classes - 0 missing values
* Dataset Title: MicroMass - Mixed (mixed spectra version) * Abstract: A dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. * Source:…
64 runs1 likes4 downloads5 reach5 impact
360 instances - 1301 features - 10 classes - 0 missing values
0. airplane 1. automobile 2. bird 3. cat 4. deer 5. dog 6. frog 7. horse 8. ship 9. truck CIFAR-10 contains 6000 images per class. The original train-test split randomly divided these into 5000 train…
143 runs0 likes4 downloads4 reach12 impact
60000 instances - 3073 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. The maps were scanned in 8 bit grey value at density of 400dpi,…
9398 runs1 likes2 downloads3 reach13 impact
2000 instances - 241 features - 10 classes - 0 missing values
Fashion-MNIST is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a…
439 runs0 likes11 downloads11 reach15 impact
70000 instances - 785 features - 10 classes - 0 missing values
* Title of Database: Spoken Arabic Digit * Abstract: This dataset contains time series of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 males…
1 runs0 likes7 downloads7 reach7 impact
263256 instances - 15 features - 10 classes - 0 missing values
Normalized version of the pokerhand data set. Automated file upload of pokerhand-normalized.arff
314 runs0 likes12 downloads12 reach3 impact
829201 instances - 11 features - 10 classes - 0 missing values
Dataset created to study concept drift in stream mining. It is constructed by combining the Covertype, Poker-Hand, and Electricity datasets. More details can be found in: Albert Bifet, Geoff Holmes,…
332 runs0 likes27 downloads27 reach4 impact
1455525 instances - 73 features - 10 classes - 0 missing values
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the…
0 runs0 likes0 downloads0 reach2 impact
70000 instances - 785 features - 10 classes - 0 missing values
10% stratified subsample of the original SVHN data
0 runs0 likes0 downloads0 reach1 impact
9927 instances - 3073 features - 10 classes - 0 missing values
50% stratified subsample of the original SVHN data
0 runs0 likes0 downloads0 reach1 impact
49644 instances - 3073 features - 10 classes - 0 missing values
Tattile Via Gaetano Donizetti, 1-3-5,25030 Mairano (Brescia), Italy. ### Dataset Description Semeion Handwritten Digit Data Set, where 1593 handwritten digits from around 80 persons were scanned and…
31160 runs0 likes22 downloads22 reach51 impact
1593 instances - 257 features - 10 classes - 0 missing values
* Abstract: Purpose is to predict poker hands * Source - Creators: Robert Cattral (cattral '@' gmail.com) Franz Oppacher (oppacher '@' scs.carleton.ca) Carleton University, Department of Computer…
1 runs0 likes5 downloads5 reach7 impact
1025009 instances - 11 features - 10 classes - 0 missing values
2126 fetal cardiotocograms (CTGs) were automatically processed and the respective diagnostic features measured. The CTGs were also classified by three expert obstetricians and a consensus…
24176 runs4 likes29 downloads33 reach49 impact
2126 instances - 36 features - 10 classes - 0 missing values
This simple domain contains 7 Boolean attributes and 10 classes, the set of decimal digits. Recall that LED displays contain 7 light-emitting diodes -- hence the reason for 7 attributes. The class…
13006 runs0 likes9 downloads9 reach11 impact
500 instances - 8 features - 10 classes - 0 missing values
No data.
414 runs0 likes8 downloads8 reach52 impact
690 instances - 8262 features - 10 classes - 0 missing values
No data.
2193 runs1 likes16 downloads17 reach2 impact
1484 instances - 9 features - 10 classes - 0 missing values
No data.
203 runs0 likes5 downloads5 reach11 impact
878 instances - 7455 features - 10 classes - 0 missing values
No data.
416 runs1 likes13 downloads14 reach53 impact
1050 instances - 3239 features - 10 classes - 0 missing values
No data.
428 runs0 likes12 downloads12 reach53 impact
1003 instances - 3183 features - 10 classes - 0 missing values
The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of…
13225 runs3 likes62 downloads65 reach27 impact
70000 instances - 785 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
33425 runs0 likes17 downloads17 reach4 impact
2000 instances - 7 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. The maps were scanned in 8 bit grey value at density of 400dpi,…
26228 runs0 likes17 downloads17 reach5 impact
2000 instances - 241 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
35874 runs0 likes17 downloads17 reach5 impact
2000 instances - 217 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
35644 runs0 likes10 downloads10 reach4 impact
2000 instances - 77 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
36581 runs0 likes19 downloads19 reach4 impact
2000 instances - 65 features - 10 classes - 0 missing values
We create a digit database by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by…
34967 runs0 likes20 downloads20 reach4 impact
10992 instances - 17 features - 10 classes - 0 missing values
1. Title of Database: Optical Recognition of Handwritten Digits 2. Source: E. Alpaydin, C. Kaynak Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr…
34174 runs3 likes22 downloads25 reach4 impact
5620 instances - 65 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
32909 runs0 likes21 downloads21 reach4 impact
2000 instances - 48 features - 10 classes - 0 missing values
led24-pmlb
31 runs0 likes2 downloads2 reach14 impact
3200 instances - 25 features - 10 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
396 runs0 likes16 downloads16 reach8 impact
3468 instances - 785 features - 10 classes - 0 missing values
No data.
44 runs0 likes1 downloads1 reach1 impact
1000000 instances - 13 features - 11 classes - 0 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable,…
519 runs0 likes7 downloads7 reach6 impact
203 instances - 17 features - 11 classes - 0 missing values
Dataset Title: Localization Data for Person Activity Data Set Abstract: Data contains recordings of five people performing different activities. Each person wore four sensors (tags) while performing…
6 runs0 likes4 downloads4 reach6 impact
164860 instances - 8 features - 11 classes - 0 missing values
####1. Summary This database was generated by the Laboratory of Image Processing and Pattern Recognition (INPG-LTIRF) in the development of the Esprit project ELENA No. 6891 and the Esprit working…
18592 runs0 likes11 downloads11 reach11 impact
5500 instances - 41 features - 11 classes - 0 missing values
Author: Marius Lindauer Date: 27.02.2014 These data set was generated for a publication about claspfolio 2.0, i.e., an algorithm selector for ASP. The algorithm portfolio of clasp (2.1.4)…
0 runs0 likes0 downloads0 reach1 impact
1294 instances - 143 features - 11 classes - 18258 missing values
No data.
400 runs0 likes6 downloads6 reach4 impact
45164 instances - 75 features - 11 classes - 0 missing values
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios. Collected by David Deterding (data and…
25934 runs0 likes14 downloads14 reach36 impact
990 instances - 13 features - 11 classes - 0 missing values
No data.
283 runs0 likes5 downloads5 reach15 impact
96 instances - 4027 features - 11 classes - 19667 missing values
No data.
222 runs0 likes10 downloads10 reach8 impact
1504 instances - 2887 features - 13 classes - 0 missing values
The aim is to determine the type of arrhythmia from the ECG recordings. This database contains 279 attributes, 206 of which are linear valued and the rest are nominal. Concerning the study of H. Altay…
4428 runs0 likes48 downloads48 reach4 impact
452 instances - 280 features - 13 classes - 408 missing values
Multiclass cancer diagnosis using 16063 tumor gene expression signatures. PNAS, VOL 98, no 26, pp. 15149-15154, December 18, 2001. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.-H. Yeang, M.…
116 runs0 likes7 downloads7 reach13 impact
190 instances - 16064 features - 14 classes - 0 missing values
source: http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/ authors: L. Xu, F. Hutter, H. Hoos, K. Leyton-Brown translator in coseal format: M. Lindauer with the help of Alexandre Frechette the data do…
0 runs0 likes0 downloads0 reach1 impact
296 instances - 116 features - 14 classes - 1810 missing values
No data.
426 runs0 likes15 downloads15 reach77 impact
2463 instances - 2001 features - 17 classes - 0 missing values
Abstract: A chess endgame data set representing the positions on the board of the white king, the white rook, and the black king. The task is to determine the optimum number of turn required for white…
25 runs0 likes5 downloads5 reach6 impact
28056 instances - 7 features - 18 classes - 0 missing values
No data.
1777 runs0 likes15 downloads15 reach2 impact
28056 instances - 7 features - 18 classes - 0 missing values
No data.
314 runs1 likes8 downloads9 reach2 impact
1000000 instances - 36 features - 19 classes - 0 missing values
This is the large soybean database from the UCI repository, with its training and test database combined into a single file. There are 19 classes, only the first 15 of which have been used in prior…
40719 runs1 likes51 downloads52 reach4 impact
683 instances - 36 features - 19 classes - 2337 missing values
No data.
163 runs0 likes13 downloads13 reach11 impact
1560 instances - 8461 features - 20 classes - 0 missing values
__Major changes w.r.t. version 2: ignored variable 3 in this upload as this seems to be ea perfect predictor.__ Tamilnadu Electricity Board Hourly Readings dataset. Real-time readings were collected…
0 runs0 likes1 downloads1 reach4 impact
45781 instances - 4 features - 20 classes - 0 missing values
The Sheffield (previously UMIST) Face Database consists of 564 images of 20 individuals (mixed race/gender/appearance). Each individual is shown in a range of poses from profile to frontal views -…
53 runs0 likes1 downloads1 reach6 impact
575 instances - 10305 features - 20 classes - 0 missing values
### Description MicroMass (pure spectra version) is a dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. ### Source ``` Pierre Mahé,…
39628 runs1 likes15 downloads16 reach89 impact
571 instances - 1301 features - 20 classes - 0 missing values
Citation Request: This primary tumor domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1261 runs0 likes16 downloads16 reach4 impact
339 instances - 18 features - 21 classes - 225 missing values
No data.
50 runs0 likes1 downloads1 reach3 impact
1000000 instances - 18 features - 22 classes - 0 missing values
The dataset collects data from an Android smartphone positioned in the chest pocket. Accelerometer Data are collected from 22 participants walking in the wild over a predefined path. The dataset is…
80 runs0 likes8 downloads8 reach7 impact
149332 instances - 5 features - 22 classes - 0 missing values
This is a 10% stratified subsample of the data from the 1999 ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php). Modified by TunedIT (converted to ARFF format)…
25 runs1 likes35 downloads36 reach7 impact
494020 instances - 42 features - 23 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) Data set for KDD Cup 1999 Modified by TunedIT (converted to ARFF format)…
4 runs1 likes19 downloads20 reach7 impact
4898431 instances - 42 features - 23 classes - 0 missing values
No data.
37 runs0 likes2 downloads2 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
31 runs0 likes1 downloads1 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
30 runs0 likes2 downloads2 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
33 runs0 likes4 downloads4 reach3 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
7303 runs0 likes12 downloads12 reach4 impact
226 instances - 70 features - 24 classes - 317 missing values
No data.
159 runs0 likes11 downloads11 reach11 impact
1657 instances - 3759 features - 25 classes - 0 missing values
No data.
28 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
32 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
31 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
60 runs0 likes2 downloads2 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
311 runs0 likes3 downloads3 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach2 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
34 runs0 likes2 downloads2 reach4 impact
1000000 instances - 17 features - 26 classes - 0 missing values
1. TITLE: Letter Image Recognition Data The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The…
67577 runs1 likes70 downloads71 reach4 impact
20000 instances - 17 features - 26 classes - 0 missing values
### Description ISOLET (Isolated Letter Speech Recognition) dataset was generated as follows: 150 subjects spoke the name of each letter of the alphabet twice. Hence, there are 52 training examples…
48423 runs0 likes68 downloads68 reach124 impact
7797 instances - 618 features - 26 classes - 0 missing values
1. Title of Database: Abalone data 2. Sources: (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of Primary Industry and Fisheries, Tasmania…
34894 runs0 likes18 downloads18 reach2 impact
4177 instances - 9 features - 28 classes - 0 missing values
Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of…
2046 runs0 likes0 downloads0 reach4 impact
1000 instances - 24 features - 30 classes - 0 missing values
Abstract: This dataset consists in a collection of shape and texture features extracted from digital images of leaf specimens originating from a total of 40 different plant species. Source: This…
112 runs0 likes9 downloads9 reach6 impact
340 instances - 16 features - 30 classes - 0 missing values
Source: The dataset was created by Angeliki Xifara (angxifara @ gmail.com, Civil/Structural Engineer) and was processed by Athanasios Tsanas (tsanasthanasis @ gmail.com, Oxford Centre for Industrial…
103 runs1 likes4 downloads5 reach5 impact
768 instances - 10 features - 37 classes - 0 missing values
This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge. As described on the original website: There are ten different images of each of 40…
53 runs0 likes0 downloads0 reach5 impact
400 instances - 4097 features - 40 classes - 0 missing values
Chocolate Bar Ratings. Expert ratings of over 1,700 chocolate bars. Each chocolate is evaluated from a combination of both objective qualities and subjective interpretation. A rating here only…
0 runs0 likes0 downloads0 reach1 impact
1794 instances - 9 features - 41 classes - 0 missing values
Chocolate Bar Ratings. Expert ratings of over 1,700 chocolate bars. Each chocolate is evaluated from a combination of both objective qualities and subjective interpretation. A rating here only…
0 runs0 likes0 downloads0 reach1 impact
1795 instances - 9 features - 42 classes - 1 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
0 runs0 likes0 downloads0 reach2 impact
51839 instances - 1569 features - 43 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
0 runs0 likes0 downloads0 reach2 impact
51839 instances - 1569 features - 43 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
0 runs0 likes0 downloads0 reach2 impact
51839 instances - 2917 features - 43 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
1 runs0 likes0 downloads0 reach2 impact
51839 instances - 257 features - 43 classes - 0 missing values
No data.
67 runs0 likes11 downloads11 reach11 impact
9558 instances - 26833 features - 44 classes - 0 missing values
Internet Usage Data Data Type multivariate Abstract This data contains general demographic information on internet users in 1997. Sources Original Owner [1]Graphics, Visualization, & Usability Center…
0 runs1 likes5 downloads6 reach3 impact
10108 instances - 72 features - 46 classes - 2699 missing values
Over 92 thousand images (32x32 pixels) of 46 characters from Devanagari script. Includes the alphabet as well as the numbers. Devanagari is an Indic script and forms a basis for over 100 languages…
42 runs1 likes6 downloads7 reach6 impact
92000 instances - 1025 features - 46 classes - 0 missing values
1. Title: Part of the IRAS Low Resolution Spectrometer Database 2. Sources: (a) Originator: Infra-Red Astronomy Satellite Project Database (b) Donor: John Stutz (c) Date:…
1243 runs0 likes44 downloads44 reach8 impact
531 instances - 103 features - 48 classes - 0 missing values
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the…
0 runs0 likes0 downloads0 reach2 impact
270912 instances - 785 features - 49 classes - 0 missing values