Data
Filter results by:
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes0 downloads0 reach15 impact
3140 instances - 260 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes0 downloads0 reach15 impact
5832 instances - 309 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes1 downloads1 reach16 impact
4147 instances - 49 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes0 downloads0 reach14 impact
31406 instances - 23 features - 2 classes - 29756 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
1 runs0 likes1 downloads1 reach13 impact
500 instances - 26 features - 0 classes - 0 missing values
This is a test dataset
0 runs0 likes0 downloads0 reach0 impact
This database contains the HTML source of web pages plus the ratings of a single user on these web pages. The web pages are on four separate subjects (Bands- recording artists; Goats; Sheep; and…
0 runs0 likes1 downloads1 reach20 impact
131 instances - 3 features - 3 classes - 0 missing values
This database contains the HTML source of web pages plus the ratings of a single user on these web pages. The web pages are on four separate subjects (Bands- recording artists; Goats; Sheep; and…
0 runs0 likes3 downloads3 reach20 impact
65 instances - 3 features - 2 classes - 0 missing values
This database contains the HTML source of web pages plus the ratings of a single user on these web pages. The web pages are on four separate subjects (Bands- recording artists; Goats; Sheep; and…
0 runs0 likes0 downloads0 reach20 impact
70 instances - 3 features - 3 classes - 0 missing values
This database contains the HTML source of web pages plus the ratings of a single user on these web pages. The web pages are on four separate subjects (Bands- recording artists; Goats; Sheep; and…
0 runs0 likes0 downloads0 reach20 impact
61 instances - 3 features - 3 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Horsepower treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using…
0 runs0 likes2 downloads2 reach9 impact
One of two multivariate regression data sets from paper industry, from an experiment at the paper plant Norske Skog, Skogn, Norway. They have been described and analysed in: Aldrin, M. (1996),…
0 runs0 likes2 downloads2 reach13 impact
This analysis describes and summarizes the relationships between 1987 salaries of major league baseball players and the player's performance. The salary data were taken from Sports Illustrated, April…
0 runs0 likes0 downloads0 reach13 impact
This analysis describes and summarizes the relationships between 1987 salaries of major league baseball players and the player's performance. The salary data were taken from Sports Illustrated, April…
0 runs0 likes2 downloads2 reach13 impact
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes1 downloads1 reach13 impact
40 instances - 3 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach13 impact
468 instances - 3 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes1 downloads1 reach13 impact
Data Used in "A BAYESIAN APPROACH TO DATA DISCLOSURE: OPTIMAL INTRUDER BEHAVIOR FOR CONTINUOUS DATA" by Stephen E. Fienberg, Udi E. Makov, and Ashish P. Sanil Background: ========== In this paper we…
0 runs0 likes0 downloads0 reach13 impact
662 instances - 4 features - 0 classes - 0 missing values
Data Used in "A BAYESIAN APPROACH TO DATA DISCLOSURE: OPTIMAL INTRUDER BEHAVIOR FOR CONTINUOUS DATA" by Stephen E. Fienberg, Udi E. Makov, and Ashish P. Sanil Background: ========== In this paper we…
0 runs0 likes1 downloads1 reach13 impact
662 instances - 4 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
250 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 6 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
250 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
100 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
250 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
100 instances - 6 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 6 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
100 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
100 instances - 51 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach13 impact
Dataset from Smoothing Methods in Statistics (ftp stat.cmu.edu/datasets) Simonoff, J.S. (1996). Smoothing Methods in Statistics. New York: Springer-Verlag.
0 runs0 likes1 downloads1 reach13 impact
El Nino Data Data Type spatio-temporal Abstract The data set contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific. The…
0 runs1 likes3 downloads4 reach13 impact
1. Title: Faults in a urban waste water treatment plant 2. Source Information: -- Creators: Manel Poch (igte2@cc.uab.es) Unitat d'Enginyeria Quimica Universitat Autonoma de Barcelona. Bellaterra.…
0 runs0 likes1 downloads1 reach13 impact
A family of datasets synthetically generated from a simulation of how bank-customers choose their banks. Tasks are based on predicting the fraction of bank customers who leave the bank because of full…
0 runs0 likes6 downloads6 reach13 impact
8192 instances - 9 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach13 impact
250 instances - 6 features - 0 classes - 0 missing values
The USNEWS dataset for the ASA Statistical Graphics Section's 1995 Data Analysis Exposition contains information on over 1300 American colleges and universities. The data may be obtained in either of…
0 runs0 likes0 downloads0 reach13 impact
------------------------------------------------------------------------------- TIME SERIES USED IN LONG-MEMORY PROCESSES, THE ALLAN VARIANCE AND WAVELETS BY D. B. PERCIVAL AND P. GUTTORP, A CHAPTER…
0 runs0 likes1 downloads1 reach13 impact
6875 instances - 1 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
500 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach13 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 11 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
250 instances - 6 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes0 downloads0 reach13 impact
1000 instances - 6 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: We used binary encoding for each feature (o, b, x), so the number of features is 42*3 = 126
0 runs0 likes3 downloads3 reach16 impact
67557 instances - 127 features - 0 classes - 0 missing values
#Dataset from the LIBSVM multiclass data repository.
0 runs0 likes3 downloads3 reach17 impact
108000 instances - 129 features - 0 classes - 0 missing values
Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331-339, 1995. #Dataset from the LIBSVM data repository. Preprocessing: First…
0 runs0 likes4 downloads4 reach16 impact
19928 instances - 62062 features - 0 classes - 0 missing values
This is a corrected version of the previous data file in version 1, which contained a dataset (349 instances) incorrectly merged from the original training and test sets available on UCI (there are…
0 runs0 likes3 downloads3 reach12 impact
267 instances - 45 features - 2 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes1 downloads1 reach16 impact
49749 instances - 301 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach16 impact
49749 instances - 301 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach16 impact
49749 instances - 301 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach16 impact
49749 instances - 301 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach16 impact
49749 instances - 301 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes0 downloads0 reach16 impact
64700 instances - 301 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Original data: someone from Germany working with the car industry.
0 runs0 likes1 downloads1 reach16 impact
1243 instances - 23 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository.
0 runs0 likes1 downloads1 reach16 impact
49749 instances - 301 features - 0 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes0 downloads0 reach14 impact
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes2 downloads2 reach13 impact
48 instances - 8 features - 0 classes - 0 missing values
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand. Source: TunedIT: http://tunedit.org/repo/DASL DASL file…
0 runs0 likes1 downloads1 reach13 impact
150 instances - 5 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: transform to two-class
0 runs0 likes1 downloads1 reach17 impact
862 instances - 3 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: scaled to [-1,1]
0 runs0 likes6 downloads6 reach17 impact
270 instances - 14 features - 0 classes - 0 missing values
libSVM","AAD group IJCNN 2001 neural network competition. Slide presentation in IJCNN'01, Ford Research Laboratory, 2001. http://www.geocities.com/ijcnn/nnc_ijcnn01.pdf . #Dataset from the LIBSVM data…
0 runs0 likes8 downloads8 reach17 impact
191681 instances - 23 features - 0 classes - 0 missing values
No data.
0 runs0 likes3 downloads3 reach18 impact
697641 instances - 47237 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: scaled to [-1,1]
0 runs0 likes0 downloads0 reach16 impact
3175 instances - 61 features - 0 classes - 0 missing values
We consider the following problem: You are running a cloud computing service, where customers contract to run computing services (tasks). Each task has a duration, an earliest start and latest end,…
0 runs0 likes7 downloads7 reach14 impact
libSVM","AAD group A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, 2003. #Dataset from the LIBSVM data repository…
0 runs0 likes0 downloads0 reach16 impact
7089 instances - 5 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach16 impact
1000 instances - 25 features - 0 classes - 0 missing values
knugget chase 3
0 runs0 likes2 downloads2 reach12 impact
194 instances - 40 features - 2 classes - 0 missing values
Mean While 1
0 runs0 likes3 downloads3 reach11 impact
253 instances - 38 features - 2 classes - 0 missing values
Mind Cave 2
0 runs0 likes3 downloads3 reach11 impact
125 instances - 40 features - 2 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes1 downloads1 reach16 impact
32561 instances - 124 features - 0 classes - 0 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: The original Adult data set has 14 features, among which six are continuous and eight are categorical. In this data set,…
0 runs0 likes0 downloads0 reach16 impact
32561 instances - 124 features - 0 classes - 0 missing values
Modified version of the training dataset of the Bike Sharing Demand challenge running on Kaggle (http://www.kaggle.com/c/bike-sharing-demand/) If you use the problem in publication, please cite:…
0 runs0 likes3 downloads3 reach12 impact
10886 instances - 12 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach9 impact
1000000 instances - 13 features - 0 classes - 0 missing values
Juan J. Rodriguez, Ludmila I. Kuncheva, Carlos J. Alonso (2006). Rotation Forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(10):1619-1630.…
0 runs0 likes0 downloads0 reach12 impact
1000000 instances - 12 features - 0 classes - 0 missing values
Abstract: CART book's waveform domains Source: Original Owners: Breiman,L., Friedman,J.H., Olshen,R.A., & Stone,C.J. (1984). Classification and Regression Trees. Wadsworth International Group:…
0 runs2 likes5 downloads7 reach11 impact
5000 instances - 22 features - classes - 0 missing values
Abstract: This data set contains a total 5820 evaluation scores provided by students from Gazi University in Ankara (Turkey). There is a total of 28 course specific questions and additional 5…
0 runs0 likes2 downloads2 reach16 impact
5820 instances - 33 features - classes - 0 missing values
Abstract: This data contains general demographic information on internet users in 1997. Source: Original Owner: Graphics, Visualization, & Usability Center College of Computing Geogia Institute of…
0 runs0 likes2 downloads2 reach11 impact
Abstract: This dataset contains timeseries of mel-frequency cepstrum coefficients (MFCCs) corresponding to spoken Arabic digits. Includes data from 44 male and 44 female native Arabic speakers.…
0 runs0 likes3 downloads3 reach11 impact
178526 instances - 13 features - classes - 57200 missing values
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes5 downloads5 reach12 impact
191260 instances - 479 features - 0 classes - 5587563 missing values
"The sulfur recovery unit (SRU) removes environmental pollutants from acid gas streams before they are released into the atmosphere. Furthermore, elemental sulfur is recovered as a valuable…
0 runs0 likes1 downloads1 reach11 impact
10081 instances - 7 features - 0 classes - 0 missing values
"The debutanizer column is part of a desulfuring and naphtha splitter plant." u1 Top temperature u2 Top pressure u3 Reflux flow u4 Flow to next process u5 6th tray temperature u6 Bottom…
0 runs0 likes1 downloads1 reach11 impact
2394 instances - 8 features - 0 classes - 0 missing values
Sampled http://www.openml.org/d/5889
0 runs0 likes1 downloads1 reach11 impact
761940 instances - 6 features - classes - 0 missing values
Another sample of COMET_MC
0 runs0 likes0 downloads0 reach12 impact
89640 instances - 6 features - 0 classes - 0 missing values
And another sample. (v. 2 without OpenML metainfo)
0 runs0 likes0 downloads0 reach11 impact
89640 instances - 6 features - classes - 0 missing values
Sample with OpenML metadata.
0 runs0 likes0 downloads0 reach12 impact
761940 instances - 6 features - 0 classes - 0 missing values
YAGO Schema.
0 runs0 likes0 downloads0 reach11 impact
181 instances - 4 features - classes - 0 missing values
This is a sesnor data for test it is not complete.
0 runs0 likes4 downloads4 reach11 impact
127591 instances - 27 features - classes - 0 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes14 downloads16 reach16 impact
101766 instances - 50 features - 3 classes - 0 missing values
Predicting the Geographical Origin of Music, ICDM, 2014 Abstract: Instances in this dataset contain audio features extracted from 1059 wave files. The task associated with the data is to predict the…
0 runs0 likes4 downloads4 reach13 impact
1059 instances - 118 features - 0 classes - 0 missing values
This dataset summarizes a heterogeneous set of features about articles published by Mashable in a period of two years. The goal is to predict the number of shares in social networks (popularity). *…
0 runs0 likes5 downloads5 reach11 impact
39644 instances - 61 features - 0 classes - 0 missing values
USDA, NRCS. 2008. The PLANTS Database ([Web Link], 31 December 2008). National Plant Data Center, Baton Rouge, LA 70874-4490 USA. Abstract: Data has been extracted from the USDA plants database. It…
0 runs0 likes4 downloads4 reach11 impact
Source: Creators : François Kawala (1,2) Ahlame Douzal (1) Eric Gaussier (1) Eustache Diemert (2) Institutions : (1) Université Joseph Fourier (Grenoble I) Laboratoire d'informatique de…
0 runs0 likes1 downloads1 reach11 impact
28179 instances - 97 features - classes - 0 missing values
Abstract: This data-set contains examples of buzz events from two different social networks: Twitter, and Tom's Hardware, a forum network focusing on new technology with more conservative dynamics.…
0 runs0 likes0 downloads0 reach12 impact
583250 instances - 78 features - 0 classes - 0 missing values