OpenML
Filter results by:
No data.
28 runs0 likes1 downloads1 reach11 impact
1000000 instances - 17 features - 26 classes - 0 missing values
SPAM E-mail Database The "spam" concept is diverse: advertisements for products/websites, make money fast schemes, chain letters, pornography... Our collection of spam e-mails came from our postmaster…
161528 runs4 likes89 downloads93 reach11 impact
4601 instances - 58 features - 2 classes - 0 missing values
Generator generating 3 classes of waves. Each class is generated from a combination of 2 of 3 "base" waves. For details, see Breiman,L., Friedman,J.H., Olshen,R.A., and Stone,C.J. (1984).…
19675 runs1 likes53 downloads54 reach11 impact
5000 instances - 41 features - 3 classes - 0 missing values
This radar data was collected by a system in Goose Bay, Labrador. This system consists of a phased array of 16 high-frequency antennas with a total transmitted power on the order of 6.4 kilowatts. See…
2484 runs3 likes27 downloads30 reach11 impact
351 instances - 35 features - 2 classes - 0 missing values
This database contains 13 attributes (which have been extracted from a larger set of 75) Attribute Information: ------------------------ -- 1. age -- 2. sex -- 3. chest pain type (4 values) -- 4.…
3214 runs0 likes19 downloads19 reach11 impact
270 instances - 14 features - 2 classes - 0 missing values
Compilation of promoters with known transcriptional start points for E. coli genes. The task is to recognize promoters in strings that represent nucleotides (one of A, G, T, or C). A promoter is a…
138 runs1 likes9 downloads10 reach11 impact
106 instances - 58 features - 2 classes - 0 missing values
Citation Request: This primary tumor domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1261 runs0 likes16 downloads16 reach11 impact
339 instances - 18 features - 21 classes - 225 missing values
The dataset (originally named ELEC2) contains 45,312 instances dated from 7 May 1996 to 5 December 1998. Each example of the dataset refers to a period of 30 minutes, i.e. there are 48 instances for…
106854 runs3 likes38 downloads41 reach11 impact
45312 instances - 9 features - 2 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach11 impact
163 instances - 27 features - 5 classes - 9 missing values
No data.
405 runs0 likes7 downloads7 reach11 impact
45164 instances - 75 features - 11 classes - 0 missing values
No data.
948 runs0 likes5 downloads5 reach11 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
949 runs0 likes4 downloads4 reach11 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
996 runs0 likes4 downloads4 reach11 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
882 runs0 likes6 downloads6 reach11 impact
71 instances - 63 features - 6 classes - 0 missing values
1. Title of Database: Wine recognition data Updated Sept 21, 1998 by C.Blake : Added attribute information 2. Sources: (a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration,…
1187 runs1 likes20 downloads21 reach11 impact
178 instances - 14 features - 3 classes - 0 missing values
NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many…
31491 runs2 likes30 downloads32 reach11 impact
846 instances - 19 features - 4 classes - 0 missing values
The database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood. The aim is to…
29713 runs2 likes24 downloads26 reach11 impact
6430 instances - 37 features - 6 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
2671 runs1 likes32 downloads33 reach11 impact
48842 instances - 15 features - 2 classes - 6465 missing values
Normalized version of vehicle dataset (http://www.openml.org/d/54) NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted…
372 runs0 likes10 downloads10 reach11 impact
98528 instances - 101 features - 2 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach11 impact
1000000 instances - 37 features - 2 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach11 impact
24 instances - 5 features - classes - 0 missing values
No data.
306 runs0 likes4 downloads4 reach11 impact
1000000 instances - 4 features - 2 classes - 0 missing values
1. Title: Dermatology Database 2. Source Information: (a) Original owners: -- 1. Nilsel Ilter, M.D., Ph.D., Gazi University, School of Medicine 06510 Ankara, Turkey Phone: +90 (312) 214 1080 -- 2. H.…
1756 runs0 likes14 downloads14 reach11 impact
366 instances - 35 features - 6 classes - 8 missing values
We create a digit database by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by…
37193 runs0 likes21 downloads21 reach11 impact
10992 instances - 17 features - 10 classes - 0 missing values
The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. ### Attribute…
23124 runs0 likes23 downloads23 reach11 impact
2310 instances - 20 features - 7 classes - 0 missing values
1. Title: Protein Localization Sites 2. Creator and Maintainer: Kenta Nakai Institue of Molecular and Cellular Biology Osaka, University 1-3 Yamada-oka, Suita 565 Japan nakai@imcb.osaka-u.ac.jp…
1806 runs0 likes13 downloads13 reach11 impact
336 instances - 8 features - 8 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
38639 runs0 likes19 downloads19 reach11 impact
2000 instances - 65 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
38026 runs0 likes11 downloads11 reach11 impact
2000 instances - 77 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
35343 runs0 likes17 downloads17 reach11 impact
2000 instances - 7 features - 10 classes - 0 missing values
1. Title of Database: Optical Recognition of Handwritten Digits 2. Source: E. Alpaydin, C. Kaynak Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr…
35798 runs3 likes22 downloads25 reach11 impact
5620 instances - 65 features - 10 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
34558 runs0 likes21 downloads21 reach11 impact
2000 instances - 48 features - 10 classes - 0 missing values
This file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data. This dataset is interesting because…
25075 runs1 likes33 downloads34 reach11 impact
690 instances - 16 features - 2 classes - 67 missing values
1. Title of Database: Blocks Classification 2. Sources: (a) Donato Malerba Dipartimento di Informatica University of Bari via Orabona 4 70126 Bari - Italy phone: +39 - 80 - 5443269 fax: +39 - 80 -…
2719 runs0 likes18 downloads18 reach11 impact
5473 instances - 11 features - 5 classes - 0 missing values
1. Title: Nursery Database 2. Sources: (a) Creator: Vladislav Rajkovic et al. (13 experts) (b) Donors: Marko Bohanec (marko.bohanec@ijs.si) Blaz Zupan (blaz.zupan@ijs.si) (c) Date: June, 1997 3. Past…
2210 runs0 likes18 downloads18 reach11 impact
12960 instances - 9 features - 5 classes - 0 missing values
No data.
7303 runs0 likes12 downloads12 reach11 impact
226 instances - 70 features - 24 classes - 317 missing values
Citation Request: This lymphography domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
1972 runs0 likes30 downloads30 reach11 impact
148 instances - 19 features - 4 classes - 0 missing values
Date: Tue, 15 Nov 88 15:44:08 EST From: stan To: aha@ICS.UCI.EDU 1. Title: Final settlements in labor negotitions in Canadian industry 2. Source Information -- Creators:…
7681 runs0 likes16 downloads16 reach11 impact
57 instances - 17 features - 2 classes - 326 missing values
1. TITLE: Letter Image Recognition Data The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The…
69254 runs1 likes72 downloads73 reach11 impact
20000 instances - 17 features - 26 classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs1 likes11 downloads12 reach11 impact
2000 instances - 140 features - 2 classes - 0 missing values
This is a corrected version of the previous data file in version 1, which contained a dataset (349 instances) incorrectly merged from the original training and test sets available on UCI (there are…
0 runs0 likes3 downloads3 reach12 impact
267 instances - 45 features - 2 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11140, and it has 3429 rows and 1026 features…
1 runs0 likes1 downloads1 reach12 impact
3429 instances - 1026 features - 0 classes - 0 missing values
Kung chi
1 runs0 likes4 downloads4 reach12 impact
123 instances - 40 features - 2 classes - 0 missing values
knugget chase 3
0 runs0 likes2 downloads2 reach12 impact
194 instances - 40 features - 2 classes - 0 missing values
Modified version of the training dataset of the Bike Sharing Demand challenge running on Kaggle (http://www.kaggle.com/c/bike-sharing-demand/) If you use the problem in publication, please cite:…
0 runs0 likes3 downloads3 reach12 impact
10886 instances - 12 features - 0 classes - 0 missing values
No data.
33 runs0 likes4 downloads4 reach12 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
9 runs0 likes2 downloads2 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
10 runs0 likes2 downloads2 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
6 runs0 likes1 downloads1 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach12 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach12 impact
1000000 instances - 19 features - 4 classes - 0 missing values
Juan J. Rodriguez, Ludmila I. Kuncheva, Carlos J. Alonso (2006). Rotation Forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(10):1619-1630.…
0 runs0 likes0 downloads0 reach12 impact
1000000 instances - 12 features - 0 classes - 0 missing values
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes5 downloads5 reach12 impact
191260 instances - 479 features - 0 classes - 5587563 missing values
Another sample of COMET_MC
0 runs0 likes0 downloads0 reach12 impact
89640 instances - 6 features - 0 classes - 0 missing values
Sample with OpenML metadata.
0 runs0 likes0 downloads0 reach12 impact
761940 instances - 6 features - 0 classes - 0 missing values
Abstract: This data-set contains examples of buzz events from two different social networks: Twitter, and Tom's Hardware, a forum network focusing on new technology with more conservative dynamics.…
0 runs0 likes0 downloads0 reach12 impact
583250 instances - 78 features - 0 classes - 0 missing values
The experiments were carried out with a group of 30 volunteers within an age bracket of 19-48 years. They performed a protocol of activities composed of six basic activities: three static postures…
83 runs0 likes9 downloads9 reach12 impact
180 instances - 68 features - 6 classes - 0 missing values
The data was collected retrospectively at Wroclaw Thoracic Surgery Centre for patients who underwent major lung resections for primary lung cancer in the years 2007 - 2011. The Centre is associated…
31 runs0 likes5 downloads5 reach12 impact
470 instances - 17 features - 2 classes - 0 missing values
simple engine data
52 runs0 likes6 downloads6 reach12 impact
383 instances - 6 features - 3 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10294, and it has 342 rows and 1026 features (including…
1 runs0 likes1 downloads1 reach12 impact
342 instances - 1026 features - 0 classes - 0 missing values
## Guess which points belong to signal track [COMET](http://comet.kek.jp/Introduction.html) is an experiment being constructed at the J-PARC proton beam laboratory in Japan. It will search for…
0 runs0 likes0 downloads0 reach12 impact
7619400 instances - 6 features - 0 classes - 0 missing values
Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of…
2048 runs0 likes1 downloads1 reach12 impact
1000 instances - 24 features - 30 classes - 0 missing values
In the early 2000s, Billy Beane and Paul DePodesta worked for the Oakland Athletics. While there, they literally changed the game of baseball. They didn't do it using a bat or glove, and they…
0 runs0 likes7 downloads7 reach12 impact
1232 instances - 15 features - 0 classes - 3600 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
11 runs0 likes1 downloads1 reach12 impact
44819 instances - 47 features - 3 classes - 10584 missing values
Zurich public transport delay data 2016-10-30 03:30:00 CET - 2016-11-27 01:20:00 CET cleaned and prepared at Open Data Day 2017.
0 runs0 likes2 downloads2 reach12 impact
5465575 instances - 15 features - 0 classes - 132617 missing values
Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.
0 runs0 likes8 downloads8 reach12 impact
2000 instances - 250 features - 2 classes - 0 missing values
The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril…
2 runs0 likes1 downloads1 reach12 impact
158 instances - 12 features - 0 classes - 0 missing values
The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of…
0 runs2 likes29 downloads31 reach12 impact
1309 instances - 14 features - 2 classes - 3855 missing values
microaggregation2_nominal
1 runs0 likes1 downloads1 reach12 impact
20000 instances - 21 features - 5 classes - 0 missing values
price col is int now. autoHorse dataset
15 runs0 likes0 downloads0 reach12 impact
201 instances - 69 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Case number deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning…
10 runs1 likes2 downloads3 reach12 impact
195 instances - 11 features - 0 classes - 2 missing values
No data.
66 runs0 likes3 downloads3 reach12 impact
1000000 instances - 35 features - 6 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach12 impact
1000000 instances - 16 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identifier attribute deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based…
2 runs0 likes2 downloads2 reach12 impact
398 instances - 8 features - 0 classes - 6 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Case number deleted. X treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric…
10 runs0 likes1 downloads1 reach12 impact
418 instances - 19 features - 0 classes - 1239 missing values
As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning with encoding length selection. In Progress in Connectionist-Based Information Systems.…
2 runs0 likes1 downloads1 reach12 impact
200 instances - 11 features - 0 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
2 runs0 likes0 downloads0 reach12 impact
595212 instances - 38 features - 2 classes - 846458 missing values
Source: C. Okan Sakar a, Gorkem Serbes b, Aysegul Gunduz c, Hunkar C. Tunc a, Hatice Nizam d, Betul Erdogdu Sakar e, Melih Tutuncu c, Tarkan Aydin a, M. Erdem Isenkul d, Hulya Apaydin c a Department…
0 runs0 likes0 downloads0 reach12 impact
756 instances - 754 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identification code deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based…
4 runs1 likes0 downloads1 reach12 impact
189 instances - 10 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Tumor-size treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using…
0 runs0 likes3 downloads3 reach12 impact
286 instances - 10 features - 0 classes - 9 missing values
The problem concerns Relative CPU Performance Data. More information can be obtained in the UCI Machine Learning repository (http://www.ics.uci.edu/~mlearn/MLSummary.html). The used attributes are :…
2 runs0 likes2 downloads2 reach12 impact
209 instances - 7 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! All nominal attributes and instances with missing values are deleted. Price treated as the class attribute. As used by…
2 runs0 likes0 downloads0 reach12 impact
159 instances - 16 features - 0 classes - 0 missing values
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the…
1 runs0 likes0 downloads0 reach12 impact
270912 instances - 785 features - 49 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identifier attribute deleted. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME: Sexual activity and the lifespan of male fruitflies TYPE: Designed (almost factorial)…
4 runs0 likes2 downloads2 reach12 impact
125 instances - 5 features - 0 classes - 0 missing values
Internet Usage Data Data Type multivariate Abstract This data contains general demographic information on internet users in 1997. Sources Original Owner [1]Graphics, Visualization, & Usability Center…
0 runs1 likes5 downloads6 reach12 impact
10108 instances - 72 features - 46 classes - 2699 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Weight treated as the class attribute. Identifier deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric…
10 runs0 likes2 downloads2 reach12 impact
158 instances - 8 features - 0 classes - 87 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Survival treated as the class attribute As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using…
12 runs0 likes2 downloads2 reach12 impact
130 instances - 10 features - 0 classes - 97 missing values
No data.
50 runs0 likes2 downloads2 reach12 impact
1000000 instances - 18 features - 22 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
1 runs0 likes0 downloads0 reach12 impact
51839 instances - 2917 features - 43 classes - 0 missing values
Dataset created to study concept drift in stream mining. It is constructed by combining the Covertype, Poker-Hand, and Electricity datasets. More details can be found in: Albert Bifet, Geoff Holmes,…
332 runs0 likes27 downloads27 reach12 impact
1455525 instances - 73 features - 10 classes - 0 missing values
Automated file upload of BNG(ionosphere)
99 runs1 likes4 downloads5 reach12 impact
1000000 instances - 35 features - 2 classes - 0 missing values
Automated file upload of BNG(anneal)
100 runs0 likes3 downloads3 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
30 runs0 likes1 downloads1 reach12 impact
1000000 instances - 70 features - 24 classes - 0 missing values
No data.
7 runs0 likes1 downloads1 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
28 runs0 likes1 downloads1 reach12 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
0 runs0 likes1 downloads1 reach12 impact
177147 instances - 11 features - 0 classes - 0 missing values
No data.
7 runs0 likes1 downloads1 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
29 runs0 likes1 downloads1 reach12 impact
1000000 instances - 39 features - 6 classes - 0 missing values