OpenML
Filter results by:
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. The maps were scanned in 8 bit grey value at density of 400dpi,…
11340 runs1 likes2 downloads3 reach21 impact
2000 instances - 241 features - 10 classes - 0 missing values
wine-quality-red-pmlb
31 runs1 likes1 downloads2 reach22 impact
1599 instances - 12 features - 6 classes - 0 missing values
Small dataset with time series of RAM prices over the years.
0 runs1 likes4 downloads5 reach11 impact
333 instances - 3 features - 0 classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Concrete Slump dataset (Yeh 2007) concerns the prediction of three properties of concrete…
0 runs1 likes0 downloads1 reach9 impact
103 instances - 10 features - classes - 0 missing values
test001
0 runs1 likes0 downloads1 reach9 impact
768 instances - 9 features - classes - 0 missing values
BitcoinHeist Ransomware Dataset Akcora, C.G., Li, Y., Gel, Y.R. and Kantarcioglu, M., 2019. BitcoinHeist. Topological Data Analysis for Ransomware Detection on the Bitcoin Blockchain. IJCAI-PRICAI…
0 runs1 likes0 downloads1 reach6 impact
2916697 instances - 10 features - 29 classes - 0 missing values
No data.
51 runs1 likes4 downloads5 reach11 impact
1000000 instances - 48 features - 10 classes - 0 missing values
The data consist of 2001 observations taken from a balloon about 30 kilometres above the surface of the earth. In the section of the flight shown here the balloon increases in height. As radiation…
0 runs1 likes2 downloads3 reach13 impact
2001 instances - 2 features - 0 classes - 0 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Case number deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning…
10 runs1 likes2 downloads3 reach12 impact
195 instances - 11 features - 0 classes - 2 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs1 likes1 downloads2 reach8 impact
150 instances - 5 features - classes - 0 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes information on…
0 runs1 likes1 downloads2 reach8 impact
163065 instances - 12 features - 0 classes - 0 missing values
No data.
314 runs1 likes8 downloads9 reach11 impact
1000000 instances - 36 features - 19 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
2 runs1 likes1 downloads2 reach9 impact
8192 instances - 22 features - 0 classes - 0 missing values
The Computer Activity databases are a collection of computer systems activity measures. The data was collected from a Sun Sparcstation 20/712 with 128 Mbytes of memory running in a multi-user…
5 runs1 likes2 downloads3 reach9 impact
8192 instances - 13 features - 0 classes - 0 missing values
Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets] Content Each row represents a…
0 runs1 likes2 downloads3 reach8 impact
7043 instances - 20 features - 2 classes - 0 missing values
Title: Communities and Crime Abstract: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and…
0 runs1 likes3 downloads4 reach13 impact
1994 instances - 128 features - 0 classes - 39202 missing values
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Identification code deleted. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based…
4 runs1 likes0 downloads1 reach12 impact
189 instances - 10 features - 0 classes - 0 missing values
------------------------------------------------------------------------ Primary Biliary Cirrhosis The data set found in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis,…
18 runs1 likes3 downloads4 reach14 impact
418 instances - 20 features - 0 classes - 1033 missing values
This analysis describes and summarizes the relationships between 1987 salaries of major league baseball players and the player's performance. The salary data were taken from Sports Illustrated, April…
0 runs1 likes1 downloads2 reach13 impact
26 instances - 8 features - 0 classes - 0 missing values
This database was designed on the basis of data provided by US Census Bureau [http://www.census.gov] (under Lookup Access [http://www.census.gov/cdrom/lookup]: Summary Tape File 1). The data were…
2 runs1 likes3 downloads4 reach9 impact
22784 instances - 9 features - 0 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
406 runs1 likes11 downloads12 reach16 impact
4229 instances - 1618 features - 2 classes - 0 missing values
No data.
326 runs1 likes5 downloads6 reach11 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
65 runs1 likes2 downloads3 reach9 impact
1000000 instances - 18 features - 7 classes - 0 missing values
Internet Usage Data Data Type multivariate Abstract This data contains general demographic information on internet users in 1997. Sources Original Owner [1]Graphics, Visualization, & Usability Center…
0 runs1 likes5 downloads6 reach12 impact
10108 instances - 72 features - 46 classes - 2699 missing values
This is a dataset obtained from the StatLib repository. Here is the included description: The data provided are daily stock prices from January 1988 through October 1991, for ten aerospace companies.…
5 runs1 likes9 downloads10 reach9 impact
950 instances - 10 features - 0 classes - 0 missing values
This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or…
0 runs1 likes1 downloads2 reach9 impact
70340 instances - 21 features - 3 classes - 2288 missing values
Is some hand drawn digits with labels that are 1 or 0
0 runs1 likes0 downloads1 reach8 impact
This is an artificial data set with dependencies between the attribute values. The cases are generated using the following method: X1 : uniformly distributed over [-5,5] X2 : uniformly distributed…
3 runs1 likes5 downloads6 reach13 impact
40768 instances - 11 features - 0 classes - 0 missing values
Source: Ashwin Srinivasan Department of Statistics and Data Modeling University of Strathclyde Glasgow Scotland UK ross '@' uk.ac.turing The original Landsat data for this database was generated from…
1 runs1 likes7 downloads8 reach19 impact
6435 instances - 37 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs1 likes1 downloads2 reach13 impact
4450 instances - 203 features - 0 classes - 0 missing values
This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics; (b) its assigned insurance risk rating,; (c) its normalized losses in use as…
11 runs1 likes4 downloads5 reach10 impact
159 instances - 16 features - 0 classes - 0 missing values
This is an artificial data set described in Breiman et al. (1984,p.238) (with variance 1 instead of 2). Generate the values of the 10 attributes independently using the following probabilities: P(X_1…
2 runs1 likes4 downloads5 reach10 impact
40768 instances - 11 features - 0 classes - 0 missing values
This is a 10% stratified subsample of the data from the 1999 ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php). Modified by TunedIT (converted to ARFF format)…
25 runs1 likes35 downloads36 reach15 impact
494020 instances - 42 features - 23 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
72 runs1 likes7 downloads8 reach16 impact
1545 instances - 10936 features - 2 classes - 0 missing values
Automated file upload of BNG(optdigits)
100 runs1 likes1 downloads2 reach11 impact
1000000 instances - 65 features - 10 classes - 0 missing values
Automated file upload of BNG(ionosphere)
99 runs1 likes4 downloads5 reach12 impact
1000000 instances - 35 features - 2 classes - 0 missing values
This directory contains Thyroid datasets. "ann-train.data" contains 3772 learning examples and "ann-test.data" contains 3428 testing examples. I have obtained this data from…
31 runs1 likes4 downloads5 reach14 impact
3772 instances - 22 features - 3 classes - 0 missing values
General Description of Thyroid Disease Databases and Related Files This directory contains 6 databases, corresponding test set, and corresponding documentation. They were left at the University of…
31 runs1 likes9 downloads10 reach13 impact
2800 instances - 27 features - 5 classes - 0 missing values
General Description of Thyroid Disease Databases and Related Files This directory contains 6 databases, corresponding test set, and corresponding documentation. They were left at the University of…
31 runs1 likes10 downloads11 reach13 impact
2800 instances - 27 features - 5 classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs1 likes1 downloads2 reach9 impact
2000 instances - 140 features - classes - 0 missing values
Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Concrete Slump dataset (Yeh 2007) concerns the prediction of three properties of concrete…
0 runs1 likes0 downloads1 reach10 impact
103 instances - 10 features - classes - 0 missing values
No data.
27 runs1 likes3 downloads4 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
0 runs1 likes2 downloads3 reach10 impact
798964 instances - 10 features - 3 classes - 399482 missing values
No data.
312 runs1 likes5 downloads6 reach12 impact
1000000 instances - 14 features - 3 classes - 0 missing values
1. Title: Lecturers Evaluation (Ordinal LEV) 2. Source Informaion: Donor: Arie Ben David MIS, Dept. of Technology Management Holon Academic Inst. of Technology 52 Golomb St. Holon 58102 Israel…
0 runs1 likes2 downloads3 reach13 impact
1000 instances - 5 features - 0 classes - 0 missing values
No data.
337 runs1 likes2 downloads3 reach11 impact
1000000 instances - 13 features - 3 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2865 runs1 likes18 downloads19 reach24 impact
1545 instances - 10936 features - 2 classes - 0 missing values
No data.
27 runs1 likes4 downloads5 reach10 impact
1000000 instances - 26 features - 7 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) Data set for KDD Cup 1999 Modified by TunedIT (converted to ARFF format)…
4 runs1 likes21 downloads22 reach15 impact
4898431 instances - 42 features - 23 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2853 runs1 likes7 downloads8 reach24 impact
1545 instances - 10936 features - 2 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. Example datasets for 6 different problems of DNA microarray data analysis and classification. All datasets contain gene expression data characterized by…
9 runs1 likes1 downloads2 reach14 impact
105 instances - 22284 features - 3 classes - 0 missing values
A Vergara, S Vembu, T Ayhan, M Ryan, M Homer, R Huerta. "Chemical gas sensor drift compensation using classifier ensembles." Sensors and Actuators B: Chemical 166 (2012): 320-329. I Rodriguez-Lujan, J…
68 runs1 likes10 downloads11 reach13 impact
13910 instances - 130 features - 6 classes - 0 missing values
### Description Gas Sensor Array Drift Dataset Data Set ### Sources ``` (a) Creators: Alexander Vergara (vergara '@' ucsd.edu) BioCircutis Institute University of California San Diego San Diego,…
18354 runs1 likes20 downloads21 reach44 impact
13910 instances - 129 features - 6 classes - 0 missing values
A dataset of steel plates' faults, classified into 7 different types. The goal was to train machine learning for automatic pattern recognition. The dataset consists of 27 features describing each…
277313 runs1 likes46 downloads47 reach25 impact
1941 instances - 34 features - 2 classes - 0 missing values
Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…
0 runs1 likes11 downloads12 reach11 impact
2000 instances - 140 features - 2 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
267507 runs1 likes22 downloads23 reach27 impact
1055 instances - 42 features - 2 classes - 0 missing values
__Changes w.r.t. version 1: included one target factor with 7 levels as target variable for the classification. Also deleted the previous 7 binary target variables.__ A dataset of steel plates'…
9007 runs1 likes3 downloads4 reach15 impact
1941 instances - 28 features - 7 classes - 0 missing values
Source: James P Bridge, Sean B Holden and Lawrence C Paulson University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD UK +44 (0)1223 763500…
26323 runs1 likes21 downloads22 reach43 impact
6118 instances - 52 features - 6 classes - 0 missing values
Creators: Renata Cristina Barros Madeo (Madeo, R. C. B.) Priscilla Koch Wagner (Wagner, P. K.) Sarajane Marques Peres (Peres, S. M.) {renata.si, priscilla.wagner, sarajane} at usp.br…
26327 runs1 likes16 downloads17 reach38 impact
9873 instances - 33 features - 5 classes - 0 missing values
Source: Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi…
51512 runs1 likes25 downloads26 reach27 impact
11055 instances - 31 features - 2 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs1 likes0 downloads1 reach15 impact
8885 instances - 267 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs1 likes0 downloads1 reach15 impact
8885 instances - 252 features - 0 classes - 0 missing values
1. Title: Wine Quality 2. Sources Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 3. Past Usage: P. Cortez, A. Cerdeira, F.…
0 runs1 likes13 downloads14 reach15 impact
6497 instances - 12 features - 0 classes - 0 missing values
Geographical Analysis Spatial Data This georeferenced data set was used in: Pace, R. Kelley, and Ronald Barry, Quick Computation of Regressions with a Spatially Autoregressive Dependent Variable,…
4 runs1 likes1 downloads2 reach15 impact
3107 instances - 7 features - 0 classes - 0 missing values
This database was designed on the basis of data provided by US Census Bureau [http://www.census.gov] (under Lookup Access [http://www.census.gov/cdrom/lookup]: Summary Tape File 1). The data were…
0 runs1 likes6 downloads7 reach15 impact
22784 instances - 17 features - 0 classes - 0 missing values
Multi-label dataset for text-classification. It consists of article titles and partial blurbs. Blurbs can be assigned to several categories (e.g. Science, News, Games) based on word predictors.
0 runs1 likes15 downloads16 reach16 impact
3782 instances - 1101 features - 2 classes - 0 missing values
This is the original version of the famous covertype dataset in ARFF format. Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a…
9 runs1 likes14 downloads15 reach23 impact
581012 instances - 55 features - 7 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
14257 runs1 likes25 downloads26 reach37 impact
48842 instances - 15 features - 2 classes - 6465 missing values
### Attribute Information * The first column is the class label (1 for signal, 0 for background) * 21 low-level features (kinematic properties): lepton pT, lepton eta, lepton phi, missing energy…
14236 runs1 likes9 downloads10 reach28 impact
98050 instances - 29 features - 2 classes - 9 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
11 runs1 likes1 downloads2 reach19 impact
20000 instances - 4297 features - 2 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
3036 runs1 likes4 downloads5 reach15 impact
96320 instances - 22 features - 2 classes - 0 missing values
Source: The dataset was created by Angeliki Xifara (angxifara @ gmail.com, Civil/Structural Engineer) and was processed by Athanasios Tsanas (tsanasthanasis @ gmail.com, Oxford Centre for Industrial…
103 runs1 likes5 downloads6 reach13 impact
768 instances - 10 features - 37 classes - 0 missing values
### Description MicroMass (pure spectra version) is a dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. ### Source ``` Pierre Mahé,…
39629 runs1 likes16 downloads17 reach98 impact
571 instances - 1301 features - 20 classes - 0 missing values
* Dataset Title: MicroMass - Mixed (mixed spectra version) * Abstract: A dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. * Source:…
64 runs1 likes5 downloads6 reach13 impact
360 instances - 1301 features - 10 classes - 0 missing values
Data from the RSCTC 2010 Discovery Challenge. Example datasets for 6 different problems of DNA microarray data analysis and classification. All datasets contain gene expression data characterized by…
9 runs1 likes3 downloads4 reach14 impact
95 instances - 22278 features - 5 classes - 0 missing values
* Donor: David W. Aha (aha '@' ics.uci.edu) (714) 856-8779 * Data Set Information: This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In…
159 runs1 likes5 downloads6 reach13 impact
200 instances - 14 features - 5 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Shape). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143288 runs1 likes39 downloads40 reach416 impact
1600 instances - 65 features - 100 classes - 0 missing values
### Description One-hundred plant species leaves dataset (Class = Margin). ### Sources ``` (a) Original owners of colour Leaves Samples: James Cope, Thibaut Beghin, Paolo Remagnino, Sarah Barman. The…
143050 runs1 likes17 downloads18 reach418 impact
1600 instances - 65 features - 100 classes - 0 missing values
* Dataset Title: AutoUniv Dataset data problem: autoUniv-au1-1000 * Abstract: AutoUniv is an advanced data generator for classifications tasks. The aim is to reflect the nuances and heterogeneity of…
3255 runs1 likes9 downloads10 reach23 impact
1000 instances - 21 features - 2 classes - 0 missing values
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008. 1 . Abstract: Two ground ozone level data sets are included in…
187955 runs1 likes18 downloads19 reach28 impact
2534 instances - 73 features - 2 classes - 0 missing values
Human Activity Recognition (HAR) database built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.…
24372 runs1 likes26 downloads27 reach42 impact
10299 instances - 562 features - 6 classes - 0 missing values
These weekly averages are ultimately based on measurements of 4 air samples per hour taken atop intake lines on several towers during steady periods of CO2 concentration of not less than 6 hours per…
0 runs1 likes2 downloads3 reach10 impact
2225 instances - 7 features - 0 classes - 0 missing values
Source: Original Owner: U.S. Census Bureau http://www.census.gov/ United States Department of Commerce Donor: Terran Lane and Ronny Kohavi Data Mining and Visualization Silicon Graphics. terran '@'…
0 runs1 likes8 downloads9 reach15 impact
299285 instances - 42 features - classes - 0 missing values
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807…
355 runs1 likes56 downloads57 reach20 impact
284807 instances - 31 features - 2 classes - 0 missing values
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The…
0 runs1 likes2 downloads3 reach8 impact
284807 instances - 31 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
808 runs1 likes9 downloads10 reach14 impact
100 instances - 26 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). The multi-class target feature is converted to a two-class nominal target feature by re-labeling the majority class as positive ('P') and…
143 runs1 likes11 downloads12 reach15 impact
531 instances - 102 features - 2 classes - 0 missing values
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The…
0 runs1 likes7 downloads8 reach8 impact
284807 instances - 31 features - 0 classes - 0 missing values
Compilation of promoters with known transcriptional start points for E. coli genes. The task is to recognize promoters in strings that represent nucleotides (one of A, G, T, or C). A promoter is a…
138 runs1 likes9 downloads10 reach11 impact
106 instances - 58 features - 2 classes - 0 missing values
No data.
416 runs1 likes13 downloads14 reach63 impact
1050 instances - 3239 features - 10 classes - 0 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) In this version (version 2), some features were removed. It is unclear why of how this was done.
1883 runs1 likes10 downloads11 reach9 impact
368 instances - 23 features - 2 classes - 1927 missing values
SPECT heart data This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Sources: --…
1296 runs1 likes12 downloads13 reach16 impact
267 instances - 23 features - 2 classes - 0 missing values
Once upon a time, in July 1991, the monks of Corsendonk Priory were faced with a school held in their priory, namely the 2nd European Summer School on Machine Learning. After listening more than one…
108666 runs1 likes14 downloads15 reach34 impact
554 instances - 7 features - 2 classes - 0 missing values
Speaker independent recognition of the eleven steady state vowels of British English using a specified training set of lpc derived log area ratios. Collected by David Deterding (data and…
26450 runs1 likes18 downloads19 reach43 impact
990 instances - 13 features - 11 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
608 runs1 likes9 downloads10 reach15 impact
1000 instances - 26 features - 2 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
2671 runs1 likes32 downloads33 reach11 impact
48842 instances - 15 features - 2 classes - 6465 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
1187 runs1 likes10 downloads11 reach9 impact
412 instances - 9 features - 7 classes - 96 missing values
No data.
2198 runs1 likes17 downloads18 reach9 impact
1484 instances - 9 features - 10 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
602 runs1 likes12 downloads13 reach15 impact
13750 instances - 41 features - 2 classes - 0 missing values
Binarized version of the original data set (see version 1). It converts the numeric target feature to a two-class nominal target feature by computing the mean and classifying all instances with a…
866 runs1 likes12 downloads13 reach16 impact
7129 instances - 6 features - 2 classes - 0 missing values