OpenML
Filter results by:
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
434 runs0 likes10 downloads10 reach5 impact
7019 instances - 61 features - 8 classes - 48089 missing values
No data.
428 runs0 likes12 downloads12 reach51 impact
1003 instances - 3183 features - 10 classes - 0 missing values
No data.
426 runs0 likes15 downloads15 reach75 impact
2463 instances - 2001 features - 17 classes - 0 missing values
* Abstract: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. This is a…
423 runs0 likes13 downloads13 reach4 impact
120 instances - 7 features - 2 classes - 0 missing values
No data.
416 runs1 likes13 downloads14 reach51 impact
1050 instances - 3239 features - 10 classes - 0 missing values
No data.
414 runs0 likes8 downloads8 reach50 impact
690 instances - 8262 features - 10 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
406 runs1 likes11 downloads12 reach7 impact
4229 instances - 1618 features - 2 classes - 0 missing values
Vehicle classification in distributed sensor networks. Journal of Parallel and Distributed Computing, 64(7):826-838, July 2004. This is the SensIT Vehicle (combined) dataset, retrieved 2013-11-14 from…
403 runs0 likes21 downloads21 reach7 impact
98528 instances - 101 features - 2 classes - 0 missing values
No data.
400 runs0 likes6 downloads6 reach0 impact
45164 instances - 75 features - 11 classes - 0 missing values
Datasets from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch) Dataset from: http://www.agnostic.inf.ethz.ch/datasets.php Modified by TunedIT (converted to ARFF…
396 runs0 likes15 downloads15 reach6 impact
3468 instances - 785 features - 10 classes - 0 missing values
* Abstract: The data was created by a medical expert as a data set to test the expert system, which will perform the presumptive diagnosis of two diseases of the urinary system. * Source: Jacek…
391 runs0 likes11 downloads11 reach4 impact
120 instances - 7 features - 2 classes - 0 missing values
Hayes-Roth Database This is a merged version of the separate train and test set which are usually distributed. On OpenML this train-test split can be found as one of the possible tasks. Source…
380 runs0 likes3 downloads3 reach14 impact
160 instances - 5 features - 3 classes - 0 missing values
This is a 20,000 instance sample of the original CIFAR-10 dataset. Sampled randomly and stratified, with 2000 examples per class. Training and test set are merged. Find the corresponding task for the…
380 runs0 likes3 downloads3 reach9 impact
20000 instances - 3073 features - 10 classes - 0 missing values
No data.
377 runs0 likes9 downloads9 reach50 impact
913 instances - 3101 features - 10 classes - 0 missing values
No data.
373 runs0 likes8 downloads8 reach50 impact
918 instances - 3013 features - 10 classes - 0 missing values
Normalized version of vehicle dataset (http://www.openml.org/d/54) NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted…
372 runs0 likes10 downloads10 reach0 impact
98528 instances - 101 features - 2 classes - 0 missing values
Source: http://www.ijcaonline.org/archives/volume47/number18/7291-0509 Data Set Information: In this paper, we look for to recognize the causes of users tend to cyber space in Kohkiloye and Boyer…
371 runs0 likes6 downloads6 reach4 impact
100 instances - 6 features - 2 classes - 0 missing values
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
366 runs0 likes10 downloads10 reach5 impact
8844 instances - 61 features - 7 classes - 51515 missing values
No data.
356 runs0 likes7 downloads7 reach0 impact
131072 instances - 17 features - 2 classes - 0 missing values
The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset present transactions that occurred in two days, where we have 492 frauds out of 284,807…
355 runs0 likes52 downloads52 reach10 impact
284807 instances - 31 features - 2 classes - 0 missing values
This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the years 1970, 1980, and 1990. The coding schemes have been standardized (by the IPUMS project) to be…
354 runs0 likes7 downloads7 reach5 impact
7485 instances - 61 features - 7 classes - 52048 missing values
No data.
353 runs0 likes16 downloads16 reach0 impact
120919 instances - 1002 features - 2 classes - 0 missing values
Embryonal tumours of the central nervous system Prediction of Central Nervous System Embryonal Tumour Outcome based on Gene Expression. Nature, VOL 415, pp. 436-442, 24 January 2002. Scott L. Pomeroy,…
343 runs0 likes6 downloads6 reach5 impact
60 instances - 7130 features - 2 classes - 0 missing values
No data.
337 runs1 likes2 downloads3 reach0 impact
1000000 instances - 13 features - 3 classes - 0 missing values
No data.
334 runs0 likes4 downloads4 reach0 impact
1000000 instances - 33 features - 2 classes - 0 missing values
Dataset created to study concept drift in stream mining. It is constructed by combining the Covertype, Poker-Hand, and Electricity datasets. More details can be found in: Albert Bifet, Geoff Holmes,…
332 runs0 likes26 downloads26 reach0 impact
1455525 instances - 73 features - 10 classes - 0 missing values
No data.
332 runs0 likes4 downloads4 reach0 impact
1000000 instances - 17 features - 2 classes - 0 missing values
No data.
331 runs0 likes7 downloads7 reach0 impact
1000000 instances - 20 features - 2 classes - 0 missing values
No data.
330 runs0 likes5 downloads5 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
328 runs0 likes3 downloads3 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
326 runs1 likes4 downloads5 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
No data.
326 runs0 likes4 downloads4 reach0 impact
1000000 instances - 14 features - 2 classes - 0 missing values
No data.
326 runs0 likes4 downloads4 reach0 impact
1000000 instances - 16 features - 2 classes - 0 missing values
No data.
324 runs0 likes5 downloads5 reach0 impact
1000000 instances - 37 features - 2 classes - 0 missing values
Normalized version of the Forest Covertype dataset (see version 1), so that the numerical values are between 0 and 1. Contains the forest cover type for 30 x 30 meter cells obtained from US Forest…
319 runs1 likes39 downloads40 reach0 impact
581012 instances - 55 features - 7 classes - 0 missing values
Synthetic dataset. Almost identical to [dataset 152](https://www.openml.org/d/153/edit)
319 runs0 likes4 downloads4 reach0 impact
1000000 instances - 11 features - 2 classes - 0 missing values
No data.
315 runs0 likes2 downloads2 reach0 impact
295245 instances - 11 features - 5 classes - 0 missing values
No data.
314 runs1 likes8 downloads9 reach0 impact
1000000 instances - 36 features - 19 classes - 0 missing values
Normalized version of the pokerhand data set. Automated file upload of pokerhand-normalized.arff
314 runs0 likes10 downloads10 reach0 impact
829201 instances - 11 features - 10 classes - 0 missing values
No data.
313 runs0 likes3 downloads3 reach0 impact
1000000 instances - 23 features - 2 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
313 runs0 likes34 downloads34 reach3 impact
399482 instances - 12 features - 2 classes - 0 missing values
No data.
312 runs0 likes4 downloads4 reach0 impact
1000000 instances - 14 features - 3 classes - 0 missing values
No data.
311 runs0 likes3 downloads3 reach0 impact
1000000 instances - 17 features - 26 classes - 0 missing values
No data.
311 runs0 likes5 downloads5 reach0 impact
1000000 instances - 10 features - 2 classes - 0 missing values
No data.
310 runs0 likes4 downloads4 reach0 impact
1000000 instances - 11 features - 2 classes - 0 missing values
No data.
310 runs0 likes2 downloads2 reach0 impact
1000000 instances - 14 features - 5 classes - 0 missing values
No data.
310 runs0 likes4 downloads4 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
No data.
309 runs0 likes3 downloads3 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
309 runs0 likes6 downloads6 reach0 impact
1000000 instances - 35 features - 6 classes - 0 missing values
Normalized form of codrna (351) Andrew V Uzilov, Joshua M Keegan, and David H Mathews. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC…
309 runs0 likes5 downloads5 reach0 impact
488565 instances - 9 features - 2 classes - 0 missing values
No data.
308 runs0 likes2 downloads2 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
307 runs0 likes2 downloads2 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
307 runs0 likes3 downloads3 reach0 impact
1000000 instances - 41 features - 3 classes - 0 missing values
No data.
307 runs0 likes5 downloads5 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
306 runs0 likes4 downloads4 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
306 runs0 likes3 downloads3 reach0 impact
1000000 instances - 13 features - 6 classes - 0 missing values
No data.
305 runs0 likes2 downloads2 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
305 runs0 likes3 downloads3 reach0 impact
1000000 instances - 4 features - 2 classes - 0 missing values
No data.
304 runs0 likes6 downloads6 reach0 impact
1000000 instances - 25 features - 10 classes - 0 missing values
No data.
304 runs0 likes3 downloads3 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
A 4-class version of breast-tissue dataset.
299 runs0 likes3 downloads3 reach4 impact
106 instances - 10 features - 4 classes - 0 missing values
No data.
298 runs0 likes3 downloads3 reach0 impact
1000000 instances - 11 features - 5 classes - 0 missing values
No data.
296 runs0 likes7 downloads7 reach0 impact
1000000 instances - 61 features - 2 classes - 0 missing values
No data.
296 runs0 likes5 downloads5 reach11 impact
96 instances - 4027 features - 9 classes - 19667 missing values
No data.
293 runs0 likes2 downloads2 reach0 impact
1000000 instances - 17 features - 10 classes - 0 missing values
No data.
292 runs0 likes4 downloads4 reach0 impact
1000000 instances - 37 features - 6 classes - 0 missing values
No data.
291 runs0 likes4 downloads4 reach0 impact
1000000 instances - 18 features - 7 classes - 0 missing values
No data.
290 runs0 likes5 downloads5 reach0 impact
1000000 instances - 77 features - 10 classes - 0 missing values
No data.
288 runs0 likes2 downloads2 reach0 impact
1000000 instances - 15 features - 9 classes - 0 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
287 runs0 likes24 downloads24 reach4 impact
539383 instances - 8 features - 2 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
284 runs0 likes1 downloads1 reach4 impact
96320 instances - 22 features - 2 classes - 0 missing values
No data.
283 runs0 likes5 downloads5 reach11 impact
96 instances - 4027 features - 11 classes - 19667 missing values
* Source: JP Marques de Sá, INEB-Instituto de Engenharia Biomédica, Porto, Portugal; e-mail: jpmdesa '@' gmail.com J Jossinet, inserm, Lyon, France * Data Set Information: Impedance measurements…
280 runs0 likes5 downloads5 reach4 impact
106 instances - 10 features - 6 classes - 0 missing values
No data.
268 runs0 likes9 downloads9 reach35 impact
3075 instances - 12433 features - 6 classes - 0 missing values
No data.
264 runs0 likes11 downloads11 reach35 impact
3204 instances - 13196 features - 6 classes - 0 missing values
* Dataset Title: Volcanoes on Venus - JARtool experiment Data Set Experiment: A1 * Source: Michael C. Burl MS 126-347, JPL 4800 Oak Grove Drive Pasadena, CA 91109 (818) 393-5345 Michael.C.Burl '@'…
262 runs0 likes4 downloads4 reach4 impact
3252 instances - 4 features - 5 classes - 0 missing values
No data.
253 runs0 likes6 downloads6 reach0 impact
1076790 instances - 30 features - 2 classes - 7275 missing values
Donor: Will Taylor (taylor@pluto.arc.nasa.gov) Database of surgeries on horses. Possible class attributes: 24 (whether lesion is surgical), others include: 23, 25, 26, and 27 Notes: * Hospital_Number…
233 runs0 likes8 downloads8 reach0 impact
368 instances - 28 features - 2 classes - 1927 missing values
No data.
230 runs0 likes4 downloads4 reach0 impact
1000000 instances - 35 features - 2 classes - 0 missing values
No data.
225 runs0 likes6 downloads6 reach0 impact
1000000 instances - 21 features - 2 classes - 0 missing values
No data.
222 runs0 likes10 downloads10 reach6 impact
1504 instances - 2887 features - 13 classes - 0 missing values
No data.
220 runs0 likes6 downloads6 reach9 impact
336 instances - 7903 features - 6 classes - 0 missing values
No data.
219 runs0 likes4 downloads4 reach0 impact
1000000 instances - 58 features - 2 classes - 0 missing values
No data.
219 runs0 likes5 downloads5 reach9 impact
414 instances - 6430 features - 9 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element…
218 runs0 likes15 downloads15 reach7 impact
50000 instances - 231 features - 2 classes - 8024152 missing values
No data.
216 runs0 likes12 downloads12 reach51 impact
11162 instances - 11466 features - 10 classes - 0 missing values
Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service…
216 runs0 likes11 downloads11 reach0 impact
110393 instances - 55 features - 7 classes - 0 missing values
Mammography dataset Past Usage: 1. Woods, K., Doss, C., Bowyer, K., Solka, J., Priebe, C.,
215 runs4 likes45 downloads49 reach13 impact
11183 instances - 7 features - 2 classes - 0 missing values
No data.
215 runs0 likes7 downloads7 reach9 impact
204 instances - 5833 features - 6 classes - 0 missing values
No data.
211 runs0 likes3 downloads3 reach0 impact
1000000 instances - 20 features - 7 classes - 0 missing values
No data.
211 runs0 likes4 downloads4 reach9 impact
313 instances - 5805 features - 8 classes - 0 missing values
No data.
206 runs0 likes3 downloads3 reach0 impact
1000000 instances - 39 features - 6 classes - 0 missing values
No data.
203 runs0 likes5 downloads5 reach9 impact
878 instances - 7455 features - 10 classes - 0 missing values
Oil dataset Past Usage: 1. Kubat, M., Holte, R.,
200 runs2 likes16 downloads18 reach13 impact
937 instances - 50 features - 2 classes - 0 missing values
Pizza cutter
197 runs0 likes8 downloads8 reach5 impact
661 instances - 38 features - 2 classes - 0 missing values
No data.
194 runs0 likes3 downloads3 reach0 impact
1000000 instances - 65 features - 10 classes - 0 missing values
* Title: seeds Data Set * Abstract: Measurements of geometrical properties of kernels belonging to three different varieties of wheat. A soft X-ray technique and GRAINS package were used to construct…
190 runs0 likes5 downloads5 reach4 impact
210 instances - 8 features - 3 classes - 0 missing values
Pizza cutter 3
188 runs0 likes6 downloads6 reach5 impact
1043 instances - 38 features - 2 classes - 0 missing values
Mega watt
183 runs0 likes7 downloads7 reach6 impact
253 instances - 38 features - 2 classes - 0 missing values
Dataset from the MLRR repository: http://axon.cs.byu.edu:5000/
180 runs0 likes5 downloads5 reach12 impact
294 instances - 12 features - 2 classes - 0 missing values