Data
Filter results by:
wdwd cd
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
efef fdfef
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
zaxa xcdc
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
dedfef
0 runs0 likes0 downloads0 reach7 impact
2 instances - 1 features - classes - 0 missing values
scs
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
wdede
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
wdwd
0 runs0 likes0 downloads0 reach7 impact
2 instances - 1 features - classes - 0 missing values
qsqs
0 runs0 likes0 downloads0 reach7 impact
2 instances - 1 features - classes - 0 missing values
swdw
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
werr
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
ssf
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
swd
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
ddef
0 runs0 likes0 downloads0 reach7 impact
2 instances - 2 features - classes - 0 missing values
frf r
0 runs0 likes0 downloads0 reach6 impact
2 instances - 3 features - classes - 0 missing values
e eded
0 runs0 likes0 downloads0 reach6 impact
2 instances - 4 features - classes - 0 missing values
e3r4vr t4r
0 runs0 likes0 downloads0 reach6 impact
2 instances - 5 features - classes - 0 missing values
f fr
0 runs0 likes0 downloads0 reach6 impact
2 instances - 5 features - classes - 0 missing values
testing temperature and ph
0 runs0 likes0 downloads0 reach3 impact
26 instances - 8 features - classes - 0 missing values
This data is used to test water contamination
0 runs0 likes0 downloads0 reach7 impact
26 instances - 8 features - classes - 0 missing values
AutoML challenge 2014. Original task: regression. Test and validation sets can be obtained on the Cha Learn website: https://automl.chalearn.org/data
0 runs0 likes0 downloads0 reach3 impact
400000 instances - 101 features - 0 classes - 0 missing values
AutoML challenge 2014. Original task: regression. Test and validation sets can be obtained on the Cha Learn website: https://automl.chalearn.org/data
0 runs0 likes0 downloads0 reach2 impact
99 instances - 200001 features - 0 classes - 0 missing values
% Title: Flora % Source: https://automl.chalearn.org/data % % Dataset from the first ChaLearn AutoML challenge (2014). % Only the training data is included, as there were no labels for validation and…
0 runs0 likes0 downloads0 reach3 impact
15000 instances - 200001 features - 0 classes - 0 missing values
A subset of the 3D dataset from Princeton\'s COS 429 Computer Vision course. The dataset consists of 40 models organised into 4 classes of 10 objects each.
0 runs0 likes0 downloads0 reach2 impact
16000 instances - 4 features - classes - 0 missing values
Airlines Departure Delay Prediction (Regression). Original data can be found at: http://www.transtats.bts.gov This is a processed version of the original data, designed to predict departure delay (in…
0 runs0 likes0 downloads0 reach1 impact
1000000 instances - 10 features - 0 classes - 0 missing values
Version with url set as row id, creator data missing due to bad formatting.**Author**: Kelwin Fernandes (INESC TEC, Universidade doPorto), Pedro Vinagre (ALGORITMI Research Centre, Universidade do…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 60 features - 0 classes - 0 missing values
Make target (age) numeric**Author**: 1. Title of Database: Abalone data 2. Sources: (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of…
0 runs0 likes0 downloads0 reach0 impact
4177 instances - 9 features - 0 classes - 0 missing values
Airlines Departure Delay Prediction (Regression). Original data can be found at: http://www.transtats.bts.gov This is a processed version of the original data, designed to predict departure delay (in…
0 runs0 likes0 downloads0 reach1 impact
10000000 instances - 10 features - 0 classes - 0 missing values
String datetime information extracted to numeric columns.Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC)…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 19 features - 0 classes - 0 missing values
Date converted to year/mo/day numerics.This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It contains 19 house…
0 runs0 likes0 downloads0 reach0 impact
21613 instances - 22 features - 0 classes - 0 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs0 likes0 downloads0 reach0 impact
2215023 instances - 9 features - 2 classes - 0 missing values
This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.
0 runs0 likes0 downloads0 reach1 impact
39948 instances - 12 features - 2 classes - 0 missing values
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent…
0 runs0 likes0 downloads0 reach3 impact
17379 instances - 13 features - 0 classes - 0 missing values
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent…
0 runs0 likes0 downloads0 reach2 impact
17379 instances - 13 features - 0 classes - 0 missing values
This is a preprocessed version of the anneal dataset (version 1). All missing values are treated as a nominal value with label '?'. (Quotes for clarity). Because this is not good…
0 runs0 likes0 downloads0 reach0 impact
898 instances - 39 features - 5 classes - 0 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes information on…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 12 features - 0 classes - 0 missing values
sample
0 runs0 likes0 downloads0 reach2 impact
14 instances - 5 features - classes - 0 missing values
test data
0 runs0 likes0 downloads0 reach2 impact
2 instances - 5 features - classes - 0 missing values
this is test data
0 runs0 likes0 downloads0 reach2 impact
5 instances - 5 features - classes - 0 missing values
newtest3
0 runs0 likes0 downloads0 reach3 impact
2 instances - 6 features - classes - 0 missing values
test3
0 runs0 likes0 downloads0 reach2 impact
2 instances - 8 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach2 impact
150 instances - 5 features - 3 classes - 0 missing values
This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.
0 runs0 likes0 downloads0 reach10 impact
39948 instances - 12 features - 2 classes - 0 missing values
No data.
50 runs0 likes3 downloads3 reach9 impact
1000000 instances - 61 features - 2 classes - 0 missing values
No data.
65 runs0 likes5 downloads5 reach9 impact
1000000 instances - 30 features - 4 classes - 0 missing values
No data.
230 runs0 likes4 downloads4 reach11 impact
1000000 instances - 35 features - 2 classes - 0 missing values
No data.
293 runs0 likes2 downloads2 reach11 impact
1000000 instances - 17 features - 10 classes - 0 missing values
No data.
328 runs0 likes3 downloads3 reach11 impact
1000000 instances - 4 features - 2 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
32 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes1 downloads1 reach13 impact
22 instances - 111 features - 0 classes - 0 missing values
No data.
68 runs0 likes11 downloads11 reach9 impact
1000000 instances - 10 features - 2 classes - 0 missing values
No data.
264 runs0 likes11 downloads11 reach47 impact
3204 instances - 13196 features - 6 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs1 likes1 downloads2 reach13 impact
4450 instances - 203 features - 0 classes - 0 missing values
No data.
219 runs0 likes4 downloads4 reach11 impact
1000000 instances - 58 features - 2 classes - 0 missing values
No data.
310 runs0 likes4 downloads4 reach11 impact
1000000 instances - 19 features - 4 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
19 instances - 10 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
26 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
13 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs1 likes0 downloads1 reach14 impact
8885 instances - 252 features - 0 classes - 0 missing values
No data.
326 runs0 likes4 downloads4 reach11 impact
1000000 instances - 14 features - 2 classes - 0 missing values
No data.
163 runs0 likes5 downloads5 reach9 impact
1000000 instances - 28 features - 2 classes - 0 missing values
Dataset created to study concept drift in stream mining. It is constructed by combining the Covertype, Poker-Hand, and Electricity datasets. More details can be found in: Albert Bifet, Geoff Holmes,…
332 runs0 likes27 downloads27 reach12 impact
1455525 instances - 73 features - 10 classes - 0 missing values
The problem is to learn a regression equation/rule/tree to predict the activity from the descriptive structural attributes. The data and methodology is described in detail in: - King, Ross .D., Hurst,…
5 runs0 likes1 downloads1 reach9 impact
186 instances - 61 features - 0 classes - 0 missing values
This is the pollution data so loved by writers of papers on ridge regression. Source: McDonald, G.C. and Schwing, R.C. (1973) 'Instabilities of regression estimates relating air pollution to…
0 runs0 likes1 downloads1 reach13 impact
60 instances - 16 features - 0 classes - 0 missing values
Veteran's Administration Lung Cancer Trial Taken from Kalbfleisch and Prentice, pages 223-224 Variables Treatment 1=standard, 2=test Celltype 1=squamous, 2=smallcell, 3=adeno, 4=large Survival in days…
2 runs0 likes1 downloads1 reach13 impact
137 instances - 8 features - 0 classes - 0 missing values
Dataset from Smoothing Methods in Statistics (ftp stat.cmu.edu/datasets) Simonoff, J.S. (1996). Smoothing Methods in Statistics. New York: Springer-Verlag. Gasoline comnsumption is being treated as…
2 runs0 likes0 downloads0 reach9 impact
27 instances - 5 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
37 instances - 1143 features - 0 classes - 0 missing values
No data.
68 runs0 likes2 downloads2 reach9 impact
1000000 instances - 19 features - 4 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics; (b) its assigned insurance risk rating,; (c) its normalized losses in use as…
11 runs1 likes4 downloads5 reach10 impact
159 instances - 16 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
15 instances - 10 features - 0 classes - 0 missing values
Information about the dataset CLASSTYPE: numeric CLASSINDEX: last
2 runs0 likes1 downloads1 reach13 impact
559 instances - 5 features - 0 classes - 0 missing values
No data.
304 runs0 likes7 downloads7 reach11 impact
1000000 instances - 25 features - 10 classes - 0 missing values
No data.
68 runs0 likes4 downloads4 reach9 impact
1000000 instances - 23 features - 2 classes - 0 missing values
Dataset listing all-time NFL passers through 1994 by the NFL passing efficiency rating. Associated passing statistics from which this rating is computed are included. The dataset lists statistics for…
0 runs0 likes0 downloads0 reach13 impact
26 instances - 6 features - 0 classes - 0 missing values
This S dump contains 22 data sets from the book Visualizing Data published by Hobart Press (books@hobart.com). The dump was created by data.dump() and can be read back into S by data.restore(). The…
0 runs0 likes0 downloads0 reach13 impact
73 instances - 6 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
2 runs0 likes1 downloads1 reach13 impact
468 instances - 4 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes0 downloads0 reach13 impact
4052 instances - 8 features - 0 classes - 0 missing values
wind daily average wind speeds for 1961-1978 at 12 synoptic meteorological stations in the Republic of Ireland (Haslett and raftery 1989). These data were analyzed in detail in the following article:…
0 runs0 likes6 downloads6 reach13 impact
6574 instances - 15 features - 0 classes - 0 missing values
File README ----------- chscase A collection of the data sets used in the book "A Casebook for a First Course in Statistics and Data Analysis," by Samprit Chatterjee, Mark S. Handcock and Jeffrey S.…
0 runs0 likes0 downloads0 reach11 impact
50 instances - 3 features - classes - 0 missing values
DATA-SETS FROM DIGGLE, P.J. (1990). TIME SERIES : A BIOSTATISTICAL INTRODUCTION. Oxford University Press. Table: Table A1 Lutenizing hormone Information about the dataset CLASSTYPE: numeric…
0 runs0 likes0 downloads0 reach13 impact
48 instances - 5 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach13 impact
22 instances - 40 features - 0 classes - 0 missing values
This database was designed on the basis of data provided by US Census Bureau [http://www.census.gov] (under Lookup Access [http://www.census.gov/cdrom/lookup]: Summary Tape File 1). The data were…
0 runs1 likes6 downloads7 reach14 impact
22784 instances - 17 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes0 downloads0 reach13 impact
66 instances - 6 features - 0 classes - 0 missing values
No data.
52 runs0 likes3 downloads3 reach11 impact
1000000 instances - 48 features - 10 classes - 0 missing values
17x17x2x2 tables of counts in GLIM-ready format used for the analyses in Biblarz, Timothy J., and Adrian E. Raftery. 1993. "The Effects of Family Disruption on Social Mobility." American Sociological…
3 runs0 likes1 downloads1 reach14 impact
1156 instances - 6 features - 0 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
0 runs0 likes2 downloads2 reach11 impact
100 instances - 10 features - classes - 0 missing values
This file is a text file giving details about the time series analysed in 'The Analysis of Time Series' by Chris Chatfield. The 5th edn was published in 1996 and the 6th edn in 2003. The series are…
0 runs0 likes0 downloads0 reach13 impact
235 instances - 13 features - 0 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
1078 runs0 likes11 downloads11 reach14 impact
108 instances - 8 features - 2 classes - 0 missing values
This S dump contains 22 data sets from the book Visualizing Data published by Hobart Press (books@hobart.com). The dump was created by data.dump() and can be read back into S by data.restore(). The…
0 runs0 likes2 downloads2 reach13 impact
88 instances - 3 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes0 downloads0 reach13 impact
51 instances - 7 features - 0 classes - 0 missing values
This S dump contains 22 data sets from the book Visualizing Data published by Hobart Press (books@hobart.com). The dump was created by data.dump() and can be read back into S by data.restore(). The…
2 runs0 likes1 downloads1 reach13 impact
8641 instances - 5 features - 0 classes - 0 missing values
No data.
334 runs0 likes4 downloads4 reach11 impact
1000000 instances - 33 features - 2 classes - 0 missing values
This is an artificial data set described in Breiman et al. (1984,p.238) (with variance 1 instead of 2). Generate the values of the 10 attributes independently using the following probabilities: P(X_1…
2 runs1 likes4 downloads5 reach10 impact
40768 instances - 11 features - 0 classes - 0 missing values
This file contains data from Regression Analysis By Example, 2nd Edition, by Samprit Chatterjee and Bertram Price, John Wiley, 1991. Data sets have names of the form 'rabe.xxx' where xxx is the page…
0 runs0 likes0 downloads0 reach13 impact
46 instances - 4 features - 0 classes - 0 missing values
Contains 110 data sets from the book 'The Statistical Sleuth' by Fred Ramsey and Dan Schafer; Duxbury Press, 1997. (schafer@stat.orst.edu) [14/Oct/97] (172k) Note: description taken from this web…
0 runs0 likes0 downloads0 reach13 impact
34 instances - 9 features - 0 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes3 downloads3 reach14 impact
146 instances - 10936 features - 2 classes - 0 missing values
This is a 10% stratified subsample of the data from the 1999 ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php). Modified by TunedIT (converted to ARFF format)…
25 runs1 likes35 downloads36 reach15 impact
494020 instances - 42 features - 23 classes - 0 missing values
Data from StatLib (ftp stat.cmu.edu/datasets) These data are those collected in a cloud-seeding experiment in Tasmania between mid-1964 and January 1971. Their analysis, using regression techniques…
66 runs0 likes2 downloads2 reach9 impact
108 instances - 6 features - 0 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
72 runs1 likes7 downloads8 reach16 impact
1545 instances - 10936 features - 2 classes - 0 missing values
No data.
44 runs0 likes3 downloads3 reach11 impact
1000000 instances - 15 features - 2 classes - 0 missing values