OpenML
Filter results by:
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
882 runs1 likes2 downloads3 reach7 impact
96320 instances - 22 features - 2 classes - 0 missing values
This data set contains 416 liver patient records and 167 non liver patient records.The data set was collected from north east of Andhra Pradesh, India. The class label divides the patients into 2…
154292 runs1 likes21 downloads22 reach19 impact
583 instances - 11 features - 2 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
0 runs1 likes2 downloads3 reach3 impact
798964 instances - 12 features - 3 classes - 399482 missing values
Small dataset with time series of RAM prices over the years.
0 runs1 likes4 downloads5 reach4 impact
333 instances - 3 features - 0 classes - 0 missing values
No data.
326 runs1 likes5 downloads6 reach3 impact
1000000 instances - 23 features - 2 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) Data set for KDD Cup 1999 Modified by TunedIT (converted to ARFF format)…
4 runs1 likes19 downloads20 reach7 impact
4898431 instances - 42 features - 23 classes - 0 missing values
### Attribute Information * The first column is the class label (1 for signal, 0 for background) * 21 low-level features (kinematic properties): lepton pT, lepton eta, lepton phi, missing energy…
14236 runs1 likes9 downloads10 reach18 impact
98050 instances - 29 features - 2 classes - 9 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Concrete Slump dataset (Yeh 2007) concerns the prediction of three properties of concrete…
0 runs1 likes0 downloads1 reach2 impact
103 instances - 10 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs1 likes1 downloads2 reach1 impact
150 instances - 5 features - classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs1 likes1 downloads2 reach2 impact
2000 instances - 140 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Concrete Slump dataset (Yeh 2007) concerns the prediction of three properties of concrete…
0 runs1 likes0 downloads1 reach2 impact
103 instances - 10 features - classes - 0 missing values
test001
0 runs1 likes0 downloads1 reach1 impact
768 instances - 9 features - classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs1 likes10 downloads11 reach4 impact
2000 instances - 140 features - 2 classes - 0 missing values
A dataset of steel plates' faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition. The dataset consists of 27 features describing each…
277313 runs1 likes38 downloads39 reach18 impact
1941 instances - 34 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
72 runs1 likes7 downloads8 reach9 impact
1545 instances - 10937 features - 2 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
265463 runs1 likes17 downloads18 reach19 impact
1055 instances - 42 features - 2 classes - 0 missing values
Normalized version of the Forest Covertype dataset (see version 1), so that the numerical values are between 0 and 1. Contains the forest cover type for 30 x 30 meter cells obtained from US Forest…
342 runs1 likes39 downloads40 reach4 impact
581012 instances - 55 features - 7 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
13749 runs1 likes21 downloads22 reach28 impact
48842 instances - 15 features - 2 classes - 6465 missing values
Source: Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi…
50791 runs1 likes20 downloads21 reach19 impact
11055 instances - 31 features - 2 classes - 0 missing values
Compilation of promoters with known transcriptional start points for E. coli genes. The task is to recognize promoters in strings that represent nucleotides (one of A, G, T, or C). A promoter is a…
138 runs1 likes9 downloads10 reach4 impact
106 instances - 59 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
2671 runs1 likes31 downloads32 reach4 impact
48842 instances - 15 features - 2 classes - 6465 missing values
Generator generating 3 classes of waves. Each class is generated from a combination of 2 of 3 "base" waves. For details, see Breiman,L., Friedman,J.H., Olshen,R.A., and Stone,C.J. (1984).…
19675 runs1 likes53 downloads54 reach4 impact
5000 instances - 41 features - 3 classes - 0 missing values
No data.
2193 runs1 likes16 downloads17 reach2 impact
1484 instances - 9 features - 10 classes - 0 missing values
No data.
416 runs1 likes13 downloads14 reach53 impact
1050 instances - 3239 features - 10 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
866 runs1 likes11 downloads12 reach9 impact
7129 instances - 6 features - 2 classes - 0 missing values
This is the large soybean database from the UCI repository, with its training and test database combined into a single file. There are 19 classes, only the first 15 of which have been used in prior…
40719 runs1 likes51 downloads52 reach4 impact
683 instances - 36 features - 19 classes - 2337 missing values
1. Title: Haberman's Survival Data 2. Sources: (a) Donor: Tjen-Sien Lim (limt@stat.wisc.edu) (b) Date: March 4, 1999 3. Past Usage: 1. Haberman, S. J. (1976). Generalized Residuals for Log-Linear…
3241 runs1 likes19 downloads20 reach2 impact
306 instances - 4 features - 2 classes - 0 missing values
Primate splice-junction gene sequences (DNA) with associated imperfect domain theory. Splice junctions are points on a DNA sequence at which 'superfluous' DNA is removed during the process of protein…
23161 runs1 likes15 downloads16 reach2 impact
3190 instances - 61 features - 3 classes - 0 missing values
1. Title: Hepatitis Domain 2. Sources: (a) unknown (b) Donor: G.Gong (Carnegie-Mellon University) via Bojan Cestnik Jozef Stefan Institute Jamova 39 61000 Ljubljana Yugoslavia (tel.: (38)(+61) 214-399…
2134 runs1 likes12 downloads13 reach2 impact
155 instances - 20 features - 2 classes - 167 missing values
NAME: Sonar, Mines vs. Rocks SUMMARY: This is the data set used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network [1]. The task is to train a network…
2366 runs1 likes24 downloads25 reach2 impact
208 instances - 61 features - 2 classes - 0 missing values
SPECT heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Sources: --…
1296 runs1 likes12 downloads13 reach9 impact
267 instances - 23 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
394292 runs1 likes21 downloads22 reach28 impact
601 instances - 7 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
108666 runs1 likes14 downloads15 reach27 impact
554 instances - 7 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
808 runs1 likes9 downloads10 reach7 impact
100 instances - 26 features - 2 classes - 0 missing values
1. TITLE: Letter Image Recognition Data The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The…
67577 runs1 likes70 downloads71 reach4 impact
20000 instances - 17 features - 26 classes - 0 missing values
Citation Request: This breast cancer domain was obtained from the University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. Thanks go to M. Zwitter and M. Soklic for providing the data.…
2007 runs1 likes33 downloads34 reach2 impact
286 instances - 10 features - 2 classes - 9 missing values
Current dataset was adapted to ARFF format from the UCI version. Sample code ID's were removed. ! Note that there is also a related Breast Cancer Wisconsin (Diagnosis) Data Set with a different set of…
25520 runs1 likes19 downloads20 reach2 impact
699 instances - 10 features - 2 classes - 16 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
608 runs1 likes9 downloads10 reach8 impact
1000 instances - 26 features - 2 classes - 0 missing values
### Description This dataset describes mushrooms in terms of their physical characteristics. They are classified into: poisonous or edible. ### Source ``` (a) Origin: Mushroom records are drawn from…
16392 runs1 likes41 downloads42 reach5 impact
8124 instances - 23 features - 2 classes - 2480 missing values
This file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data. This dataset is interesting because…
24545 runs1 likes32 downloads33 reach4 impact
690 instances - 16 features - 2 classes - 67 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
1187 runs1 likes10 downloads11 reach2 impact
412 instances - 9 features - 7 classes - 96 missing values
1. Title of Database: Wine recognition data Updated Sept 21, 1998 by C.Blake : Added attribute information 2. Sources: (a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration,…
1182 runs1 likes19 downloads20 reach4 impact
178 instances - 14 features - 3 classes - 0 missing values
Dataset from the MLRR repository: http://axon.cs.byu.edu:5000/ More infos: https://archive.ics.uci.edu/ml/datasets/Musk+(Version+2)
82516 runs1 likes19 downloads20 reach25 impact
6598 instances - 170 features - 2 classes - 0 missing values
Donated by P. Savicky, Institute of Computer Science, AS of CR, Czech Republic Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope.…
64659 runs1 likes29 downloads30 reach18 impact
19020 instances - 12 features - 2 classes - 0 missing values
This database encodes the complete set of possible board configurations at the end of tic-tac-toe games, where "x" is assumed to have played first. The target concept is "win for x" (i.e., true when…
385593 runs1 likes65 downloads66 reach2 impact
958 instances - 10 features - 2 classes - 0 missing values
------------------------------------------------------------------------ Primary Biliary Cirrhosis The data set found in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis,…
18 runs1 likes3 downloads4 reach7 impact
418 instances - 20 features - 0 classes - 1033 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
736 runs1 likes5 downloads6 reach8 impact
452 instances - 280 features - 2 classes - 408 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
143 runs1 likes11 downloads12 reach8 impact
531 instances - 103 features - 2 classes - 0 missing values
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable,…
109963 runs1 likes19 downloads20 reach20 impact
15545 instances - 6 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
144599 runs1 likes16 downloads17 reach19 impact
1563 instances - 38 features - 2 classes - 0 missing values
No data.
794 runs1 likes13 downloads14 reach7 impact
107 instances - 30 features - 2 classes - 0 missing values
1. Title: Wine Quality 2. Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 3. Past Usage: P. Cortez, A. Cerdeira, F.…
0 runs1 likes13 downloads14 reach6 impact
6497 instances - 12 features - 0 classes - 0 missing values
One of the data sets used in the book "Analyzing Categorical Data" by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. Further details concerning the book, including information on statistical…
2 runs0 likes0 downloads0 reach5 impact
108 instances - 5 features - 0 classes - 0 missing values
Data on the homicide rate in Detroit for the years 1961-1973. This is the data set called DETROIT in the book 'Subset selection in regression' by Alan J. Miller published in the Chapman & Hall series…
0 runs0 likes0 downloads0 reach5 impact
13 instances - 14 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach5 impact
450 instances - 4 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach5 impact
475 instances - 4 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes0 downloads0 reach5 impact
475 instances - 4 features - 0 classes - 0 missing values
A family of datasets synthetically generated from a simulation of how bank-customers choose their banks. Tasks are based on predicting the fraction of bank customers who leave the bank because of full…
0 runs0 likes1 downloads1 reach5 impact
8192 instances - 33 features - 0 classes - 0 missing values
Dataset from Smoothing Methods in Statistics (ftp stat.cmu.edu/datasets) Simonoff, J.S. (1996). Smoothing Methods in Statistics. New York: Springer-Verlag.
0 runs0 likes0 downloads0 reach5 impact
Short Summary: Lists estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. Classroom use of this data set: This data set…
0 runs0 likes4 downloads4 reach7 impact
252 instances - 15 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Attributes 2 and 8 deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based…
2 runs0 likes2 downloads2 reach7 impact
209 instances - 8 features - 0 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
12 runs0 likes4 downloads4 reach6 impact
8192 instances - 13 features - 0 classes - 0 missing values
1. Title: Faults in a urban waste water treatment plant 2. Source Information: -- Creators: Manel Poch (igte2@cc.uab.es) Unitat d'Enginyeria Quimica Universitat Autonoma de Barcelona. Bellaterra.…
0 runs0 likes1 downloads1 reach5 impact
1. Title: meta-data 2. Sources: (a) Creator: LIACC - University of Porto R.Campo Alegre 823 4150 PORTO (b) Donor: P.B.Brazdil or J.Gama Tel.: +351 600 1672 LIACC, University of Porto Fax.: +351 600…
32 runs0 likes2 downloads2 reach7 impact
528 instances - 22 features - 0 classes - 504 missing values
%%%%%%%%%%%%%%%%%%% Data-Description % %%%%%%%%%%%%%%%%%%% COIL 1999 Competition Data Data Type multivariate Abstract This data set is from the 1999 Computational Intelligence and Learning (COIL)…
0 runs0 likes0 downloads0 reach6 impact
316 instances - 12 features - 0 classes - 56 missing values
%%%%%%%%%%%%%%%%%%% Data-Description % %%%%%%%%%%%%%%%%%%% COIL 1999 Competition Data Data Type multivariate Abstract This data set is from the 1999 Computational Intelligence and Learning (COIL)…
0 runs0 likes0 downloads0 reach5 impact
316 instances - 12 features - 0 classes - 56 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Attributes 2,4, and 6 deleted. Midrange price treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M.…
0 runs0 likes0 downloads0 reach7 impact
93 instances - 23 features - 0 classes - 14 missing values
%%%%%%%%%%%%%%%%%%% Data-Description % %%%%%%%%%%%%%%%%%%% COIL 1999 Competition Data Data Type multivariate Abstract This data set is from the 1999 Computational Intelligence and Learning (COIL)…
0 runs0 likes0 downloads0 reach5 impact
316 instances - 12 features - 0 classes - 56 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
1000 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
1 runs0 likes1 downloads1 reach5 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
100 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach5 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach5 impact
100 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach5 impact
1000 instances - 101 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach5 impact
1000 instances - 51 features - 0 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
78 runs0 likes2 downloads2 reach6 impact
130 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
78 runs0 likes2 downloads2 reach7 impact
363 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes2 downloads2 reach7 impact
329 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2841 runs0 likes3 downloads3 reach16 impact
630 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
59 runs0 likes6 downloads6 reach8 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes3 downloads3 reach7 impact
337 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
67 runs0 likes1 downloads1 reach7 impact
458 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes3 downloads3 reach7 impact
193 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach7 impact
347 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2855 runs0 likes4 downloads4 reach16 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes3 downloads3 reach7 impact
355 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2862 runs0 likes5 downloads5 reach16 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2849 runs0 likes5 downloads5 reach16 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach7 impact
324 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes2 downloads2 reach7 impact
413 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
80 runs0 likes5 downloads5 reach7 impact
405 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes2 downloads2 reach6 impact
146 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes2 downloads2 reach7 impact
384 instances - 10937 features - 2 classes - 0 missing values
The KDD Cup 2009 offers the opportunity to work on large marketing databases from the French Telecom company Orange to predict the propensity of customers to switch provider (churn). Churn (wikipedia…
10982 runs0 likes15 downloads15 reach17 impact
50000 instances - 231 features - 2 classes - 8024152 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element…
11301 runs0 likes12 downloads12 reach17 impact
50000 instances - 231 features - 2 classes - 8024152 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach5 impact
400 instances - 8 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach5 impact
400 instances - 8 features - 0 classes - 0 missing values