Data
Filter results by:
Multi-label dataset. The UC Berkeley enron4 dataset represents a subset of the original enron5 dataset and consists of 1684 cases of emails with 21 labels and 1001 predictor variables.
1 runs0 likes4 downloads4 reach14 impact
1702 instances - 1054 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2834 runs0 likes4 downloads4 reach24 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2855 runs0 likes4 downloads4 reach24 impact
1545 instances - 10936 features - 2 classes - 0 missing values
No data.
27 runs1 likes3 downloads4 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: D1 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
125 runs0 likes4 downloads4 reach14 impact
8753 instances - 4 features - 5 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: B2 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
120 runs0 likes4 downloads4 reach14 impact
10668 instances - 4 features - 5 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
78 runs0 likes4 downloads4 reach15 impact
421 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes4 downloads4 reach15 impact
484 instances - 10936 features - 2 classes - 0 missing values
No data.
29 runs0 likes4 downloads4 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. Example datasets for 6 different problems of DNA microarray data analysis and classification. All datasets contain gene expression data characterized by…
9 runs0 likes4 downloads4 reach14 impact
92 instances - 59005 features - 5 classes - 0 missing values
Even smaller sample of version 1
0 runs0 likes4 downloads4 reach12 impact
149639 instances - 12 features - 2 classes - 0 missing values
No data.
33 runs0 likes4 downloads4 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
117 runs0 likes4 downloads4 reach9 impact
1000000 instances - 20 features - 5 classes - 0 missing values
No data.
306 runs0 likes4 downloads4 reach11 impact
1000000 instances - 4 features - 2 classes - 0 missing values
__Changes w.r.t. version 1: included one target factor with 7 levels as target variable for the classification. Also deleted the previous 7 binary target variables.__ A dataset of steel plates'…
9007 runs1 likes3 downloads4 reach15 impact
1941 instances - 28 features - 7 classes - 0 missing values
This is a commercial application described in Weiss & Indurkhya (1995). The data describes a telecommunication problem. No further information is available. Characteristics: (10000+5000) cases, 49…
2 runs0 likes4 downloads4 reach11 impact
15000 instances - 49 features - 0 classes - 0 missing values
This is the Tecator data set: The task is to predict the fat content of a meat sample on the basis of its near infrared absorbance spectrum. 1. Statement of permission from Tecator (the original data…
0 runs0 likes4 downloads4 reach14 impact
240 instances - 125 features - 0 classes - 0 missing values
shuttle-pmlb
10 runs0 likes4 downloads4 reach23 impact
58000 instances - 10 features - 7 classes - 0 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
6890 runs0 likes4 downloads4 reach17 impact
44819 instances - 7 features - 3 classes - 0 missing values
This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos (signal) from muon neutrinos (background). This dataset is ordered. It first contains all signal…
12 runs0 likes4 downloads4 reach13 impact
130064 instances - 51 features - 2 classes - 0 missing values
The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. __Major changes w.r.t.…
9931 runs0 likes4 downloads4 reach24 impact
2310 instances - 20 features - 7 classes - 0 missing values
### Description __Changes to version 1:__ all categorical features transformed as such. This dataset represents a set of possible advertisements on Internet pages. ### Sources (a) Creator and donor:…
1430 runs0 likes4 downloads4 reach22 impact
3279 instances - 1559 features - 2 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. Example datasets for 6 different problems of DNA microarray data analysis and classification. All datasets contain gene expression data characterized by…
9 runs1 likes3 downloads4 reach14 impact
95 instances - 22278 features - 5 classes - 0 missing values
* Dataset Title: Robot Execution Failures Data Set * Abstract: This dataset contains force and torque measurements on a robot after failure detection. Each failure is characterized by 15 force/torque…
129 runs0 likes4 downloads4 reach13 impact
117 instances - 91 features - 3 classes - 0 missing values
this is titanic survival prediction
0 runs0 likes4 downloads4 reach7 impact
891 instances - 8 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
537 runs0 likes4 downloads4 reach14 impact
285 instances - 8 features - 7 classes - 27 missing values
Squash Harvest Stored Data source: Winna Harvey Crop and Food Research, Christchurch, New Zealand The purpose of the research was to determine the changes taking place in squash fruit during the…
867 runs0 likes4 downloads4 reach15 impact
52 instances - 25 features - 3 classes - 7 missing values
Squash Harvest Unstored Data source: Winna Harvey Crop and Food Research, Christchurch, New Zealand The purpose of the research was to determine the changes taking place in squash fruit during the…
876 runs0 likes4 downloads4 reach15 impact
52 instances - 24 features - 3 classes - 39 missing values
No data.
211 runs0 likes4 downloads4 reach21 impact
313 instances - 5805 features - 8 classes - 0 missing values
Hayes-Roth Database This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Source…
384 runs0 likes4 downloads4 reach25 impact
160 instances - 5 features - 3 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
700 runs0 likes4 downloads4 reach15 impact
294 instances - 14 features - 2 classes - 782 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
672 runs0 likes4 downloads4 reach15 impact
158 instances - 8 features - 2 classes - 87 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
733 runs0 likes4 downloads4 reach14 impact
87 instances - 11 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
714 runs0 likes4 downloads4 reach15 impact
303 instances - 14 features - 2 classes - 6 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
117 runs0 likes4 downloads4 reach14 impact
50 instances - 5 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
708 runs0 likes4 downloads4 reach16 impact
286 instances - 10 features - 2 classes - 9 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
104 runs0 likes4 downloads4 reach14 impact
52 instances - 9 features - 2 classes - 0 missing values
CODING: ITEM 1 = BUSINESS CONDIDIONS 6 MONTHS FROM NOW (CONFERENCE BOARD) ITEM 2 = JOBS 6 MONTHS FROM NOW (CONFERENCE BOARD) ITEM 3 = FAMILY INCOME 6 MONTHS FROM NOW (CONFERENCE BOARD) ITEM 4 =…
560 runs0 likes4 downloads4 reach14 impact
72 instances - 4 features - 6 classes - 0 missing values
County data from the 2000 Presidential Election in Florida. Compiled by Brett Presnell Department of Statistics, University of Florida These data are derived from three sources, described below. As…
32 runs0 likes4 downloads4 reach14 impact
67 instances - 16 features - 5 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
103 runs0 likes4 downloads4 reach14 impact
92 instances - 10 features - 2 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
490 runs0 likes4 downloads4 reach13 impact
364 instances - 33 features - 6 classes - 101 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
670 runs0 likes4 downloads4 reach14 impact
62 instances - 8 features - 2 classes - 8 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
102 runs0 likes4 downloads4 reach14 impact
67 instances - 15 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
107 runs0 likes4 downloads4 reach15 impact
66 instances - 12 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
755 runs0 likes4 downloads4 reach14 impact
54 instances - 8 features - 2 classes - 120 missing values
No data.
996 runs0 likes4 downloads4 reach11 impact
74 instances - 63 features - 4 classes - 0 missing values
No data.
949 runs0 likes4 downloads4 reach11 impact
74 instances - 63 features - 4 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
119 runs0 likes4 downloads4 reach14 impact
95 instances - 9 features - 2 classes - 9 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
688 runs0 likes4 downloads4 reach14 impact
294 instances - 14 features - 2 classes - 782 missing values
* Abstract: Purpose is to predict poker hands * Source - Creators: Robert Cattral (cattral '@' gmail.com) Franz Oppacher (oppacher '@' scs.carleton.ca) Carleton University, Department of Computer…
1 runs0 likes5 downloads5 reach15 impact
1025009 instances - 11 features - 10 classes - 0 missing values
* Title: seeds Data Set * Abstract: Measurements of geometrical properties of kernels belonging to three different varieties of wheat. A soft X-ray technique and GRAINS package were used to construct…
190 runs0 likes5 downloads5 reach13 impact
210 instances - 8 features - 3 classes - 0 missing values
* Dataset Title: Vertebra Column - 3 classes * Abstract: Data set containing values for six biomechanical features used to classify orthopaedic patients into 3 classes (normal, disk hernia or…
154 runs0 likes5 downloads5 reach13 impact
310 instances - 7 features - 3 classes - 0 missing values
* Dataset Title: Vertebra Column - 2 classes * Abstract: Data set containing values for six biomechanical features used to classify orthopaedic patients into 3 classes (normal, disk hernia or…
124 runs0 likes5 downloads5 reach14 impact
310 instances - 7 features - 2 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: A4 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
136 runs0 likes5 downloads5 reach14 impact
1515 instances - 4 features - 5 classes - 0 missing values
pie chart 1
102 runs0 likes5 downloads5 reach13 impact
705 instances - 38 features - 2 classes - 0 missing values
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes5 downloads5 reach12 impact
191260 instances - 479 features - 0 classes - 5587563 missing values
This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). *…
0 runs0 likes5 downloads5 reach11 impact
39644 instances - 61 features - 0 classes - 0 missing values
The data was collected retrospectively at Wroclaw Thoracic Surgery Centre for patients who underwent major lung resections for primary lung cancer in the years 2007 - 2011. The Centre is associated…
31 runs0 likes5 downloads5 reach12 impact
470 instances - 17 features - 2 classes - 0 missing values
DEXTER is a text classification problem in a bag-of-word representation. This is a two-class classification problem with sparse continuous input variables. This dataset is one of five datasets of the…
0 runs0 likes5 downloads5 reach20 impact
600 instances - 20001 features - 2 classes - 0 missing values
Small dataset with time series of RAM prices over the years.
0 runs1 likes4 downloads5 reach11 impact
333 instances - 3 features - 0 classes - 0 missing values
0. airplane 1. automobile 2. bird 3. cat 4. deer 5. dog 6. frog 7. horse 8. ship 9. truck CIFAR-10 contains 6000 images per class. The original train-test split randomly divided these into 5000 train…
151 runs0 likes5 downloads5 reach20 impact
60000 instances - 3073 features - 10 classes - 0 missing values
No data.
51 runs1 likes4 downloads5 reach11 impact
1000000 instances - 48 features - 10 classes - 0 missing values
No data.
73 runs0 likes5 downloads5 reach11 impact
1000000 instances - 16 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
82 runs0 likes5 downloads5 reach15 impact
405 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes5 downloads5 reach15 impact
275 instances - 10936 features - 2 classes - 0 missing values
This is a family of datasets synthetically generated from a realistic simulation of the dynamics of a Unimation Puma 560 robot arm. There are eight datastets in this family . In this repository we…
2 runs0 likes5 downloads5 reach9 impact
8192 instances - 9 features - 0 classes - 0 missing values
Primary Biliary Cirrhosis This data set is a follow-up to the original PBC data set, as discussed in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991. An…
0 runs0 likes5 downloads5 reach13 impact
1945 instances - 19 features - 0 classes - 1133 missing values
Short Summary: Lists estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. Classroom use of this data set: This data set…
25 runs0 likes5 downloads5 reach19 impact
252 instances - 15 features - 0 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes5 downloads5 reach15 impact
250 instances - 10936 features - 2 classes - 0 missing values
No data.
90 runs2 likes3 downloads5 reach11 impact
663552 instances - 13 features - 2 classes - 0 missing values
No data.
90 runs0 likes5 downloads5 reach9 impact
137781 instances - 10 features - 7 classes - 0 missing values
No data.
87 runs0 likes5 downloads5 reach11 impact
295245 instances - 11 features - 5 classes - 0 missing values
No data.
48 runs1 likes4 downloads5 reach11 impact
1000000 instances - 77 features - 10 classes - 0 missing values
No data.
330 runs0 likes5 downloads5 reach11 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
290 runs0 likes5 downloads5 reach11 impact
1000000 instances - 77 features - 10 classes - 0 missing values
No data.
71 runs0 likes5 downloads5 reach11 impact
1000000 instances - 17 features - 2 classes - 0 missing values
No data.
73 runs0 likes5 downloads5 reach9 impact
1000000 instances - 30 features - 2 classes - 0 missing values
No data.
324 runs0 likes5 downloads5 reach11 impact
1000000 instances - 37 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2862 runs0 likes5 downloads5 reach24 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
66 runs0 likes5 downloads5 reach15 impact
259 instances - 10936 features - 2 classes - 0 missing values
No data.
65 runs0 likes5 downloads5 reach9 impact
1000000 instances - 30 features - 4 classes - 0 missing values
Donor: David W. Aha (aha@ics.uci.edu) This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one…
37 runs0 likes5 downloads5 reach9 impact
303 instances - 14 features - 0 classes - 6 missing values
No data.
163 runs0 likes5 downloads5 reach9 impact
1000000 instances - 28 features - 2 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics; (b) its assigned insurance risk rating,; (c) its normalized losses in use as…
11 runs1 likes4 downloads5 reach10 impact
159 instances - 16 features - 0 classes - 0 missing values
This is an artificial data set described in Breiman et al. (1984,p.238) (with variance 1 instead of 2). Generate the values of the 10 attributes independently using the following probabilities: P(X_1…
2 runs1 likes4 downloads5 reach10 impact
40768 instances - 11 features - 0 classes - 0 missing values
Automated file upload of BNG(ionosphere)
99 runs1 likes4 downloads5 reach12 impact
1000000 instances - 35 features - 2 classes - 0 missing values
This directory contains Thyroid datasets. "ann-train.data" contains 3772 learning examples and "ann-test.data" contains 3428 testing examples. I have obtained this data from…
31 runs1 likes4 downloads5 reach14 impact
3772 instances - 22 features - 3 classes - 0 missing values
No data.
311 runs0 likes5 downloads5 reach11 impact
1000000 instances - 10 features - 2 classes - 0 missing values
No data.
307 runs0 likes5 downloads5 reach11 impact
1000000 instances - 4 features - 2 classes - 0 missing values
Normalized form of codrna (351) Andrew V Uzilov, Joshua M Keegan, and David H Mathews. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC…
309 runs0 likes5 downloads5 reach9 impact
488565 instances - 9 features - 2 classes - 0 missing values
* Source: JP Marques de Sá, INEB-Instituto de Engenharia Biomédica, Porto, Portugal; e-mail: jpmdesa '@' gmail.com J Jossinet, inserm, Lyon, France * Data Set Information: Impedance measurements…
280 runs0 likes5 downloads5 reach13 impact
106 instances - 10 features - 6 classes - 0 missing values
pie chart 2
101 runs0 likes5 downloads5 reach13 impact
745 instances - 37 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2849 runs0 likes5 downloads5 reach24 impact
1545 instances - 10936 features - 2 classes - 0 missing values
Dataset title laLSVT Voice Rehabilitation Data Set Source: The dataset was created by Athanasios Tsanas (tsanasthanasis '@' gmail.com) of the University of Oxford. Abstract: 126 samples from 14…
162 runs0 likes5 downloads5 reach13 impact
126 instances - 311 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
76 runs0 likes5 downloads5 reach15 impact
187 instances - 10936 features - 2 classes - 0 missing values
No data.
27 runs1 likes4 downloads5 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
No data.
27 runs0 likes5 downloads5 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
Originally from the StatLog project. The raw data is still available on [UCI](https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junction+Gene+Sequences)). The data consists of 3,186…
7055 runs0 likes5 downloads5 reach24 impact
3186 instances - 181 features - 3 classes - 0 missing values
The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch,…
6 runs0 likes5 downloads5 reach18 impact
506 instances - 14 features - 0 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
3036 runs1 likes4 downloads5 reach15 impact
96320 instances - 22 features - 2 classes - 0 missing values