People
Florian Pargent
Search these datasets in more detail

Florian's datasets

This data represents crime reported to the Seattle Police Department (SPD). Each row contains the record of a unique event where at least one criminal offense was reported by a member of the community…
0 runs0 likes0 downloads0 reach6 impact
52358 instances - 8 features - 0 classes - 650 missing values
Zurich public transport delay data 2016-10-30 03:30:00 CET - 2016-11-27 01:20:00 CET cleaned and prepared at Open Data Day 2017. For this version, the task was downsampled to 0.5 percent. Some…
0 runs0 likes0 downloads0 reach6 impact
27327 instances - 18 features - 0 classes - 657 missing values
This dataset consists of beer reviews from Beeradvocate. The data span a period of more than 10 years, including all ~1.5 million reviews up to November 2011. Each review includes ratings in terms of…
0 runs0 likes0 downloads0 reach6 impact
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure. For this…
0 runs0 likes1 downloads1 reach6 impact
26969 instances - 8 features - 2 classes - 0 missing values
This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or…
0 runs1 likes1 downloads2 reach9 impact
70340 instances - 21 features - 3 classes - 2288 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs0 likes0 downloads0 reach7 impact
538638 instances - 7 features - 2 classes - 0 missing values
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes0 downloads0 reach9 impact
82318 instances - 478 features - 2 classes - 2399311 missing values
Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) [http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml]. The dataset includes TLC trips of the green line in…
0 runs0 likes0 downloads0 reach9 impact
581835 instances - 15 features - 0 classes - 0 missing values
Hourly particulate matter air polution data of Great Britain for the year 2017, provided by Ricardo Energy and Environment on behalf of the UK Department for Environment, Food and Rural Affairs…
0 runs0 likes0 downloads0 reach8 impact
394299 instances - 10 features - 0 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
2 runs0 likes0 downloads0 reach12 impact
595212 instances - 38 features - 2 classes - 846458 missing values
Road Safety - Vehicles by Make and Model 2016. Predict the sex of drivers involved in personal injury road accidents in Great Britain in 2016, based on characteristics of their vehicles. The data was…
0 runs0 likes0 downloads0 reach9 impact
Survey to know if people self-identify as Midwesterners. For this version, some features were removed and all remaining character features were recoded as nominal factor variables. The variable…
0 runs0 likes0 downloads0 reach9 impact
description For this version, some features were removed and all remaining character features were recoded as nominal factor variables. The variable 'Current_Annual_Salary' is used as target by…
0 runs0 likes0 downloads0 reach9 impact
9228 instances - 7 features - 0 classes - 17 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes information on…
0 runs0 likes0 downloads0 reach9 impact
163065 instances - 7 features - 0 classes - 0 missing values
This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or…
0 runs0 likes0 downloads0 reach9 impact
1406824 instances - 25 features - 3 classes - 44448 missing values
Payments given by healthcare manufacturing companies to medical doctors or hospitals For this version, all features were recoded as nominal factor variables.
0 runs0 likes0 downloads0 reach9 impact
73354 instances - 6 features - 2 classes - 82912 missing values
User profile data for San Francisco OkCupid users published in [Kim, A. Y., & Escobedo-Land, A. (2015). OKCupid data for introductory statistics and data science courses. Journal of Statistics…
0 runs0 likes0 downloads0 reach10 impact
50789 instances - 20 features - 3 classes - 154107 missing values
130k wine reviews with variety, location, winery, price, and description. Downloaded from Kaggle [https://www.kaggle.com/zynicide/wine-reviews/home] on 29.10.2018. The original data was scraped from…
0 runs0 likes1 downloads1 reach9 impact
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs0 likes1 downloads1 reach9 impact
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes1 downloads1 reach10 impact
191260 instances - 478 features - 2 classes - 5587563 missing values
This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.
0 runs0 likes0 downloads0 reach10 impact
39948 instances - 12 features - 2 classes - 0 missing values
this is just to test a bug
0 runs0 likes0 downloads0 reach9 impact
10000 instances - 479 features - 0 classes - 292020 missing values
this is just to test a bug
0 runs0 likes0 downloads0 reach9 impact
191260 instances - 479 features - 0 classes - 5587563 missing values
this is just to test a bug
0 runs0 likes0 downloads0 reach9 impact
191260 instances - 479 features - 0 classes - 5587563 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
0 runs0 likes0 downloads0 reach12 impact
5000 instances - 20 features - 2 classes - 0 missing values
User profile data for San Francisco OkCupid users published in [Kim, A. Y., & Escobedo-Land, A. (2015). OKCupid data for introductory statistics and data science courses. Journal of Statistics…
0 runs0 likes1 downloads1 reach11 impact
50789 instances - 20 features - 3 classes - 154107 missing values
130k wine reviews with variety, location, winery, price, and description. Downloaded from Kaggle [https://www.kaggle.com/zynicide/wine-reviews/home] on 29.10.2018. The original data was scraped from…
0 runs0 likes0 downloads0 reach9 impact
129971 instances - 13 features - 0 classes - 204752 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs0 likes0 downloads0 reach9 impact
Hourly particulate matter air polution data of Great Britain for the year 2017, provided by Ricardo Energy and Environment on behalf of the UK Department for Environment, Food and Rural Affairs…
0 runs0 likes0 downloads0 reach9 impact
394299 instances - 12 features - 0 classes - 0 missing values
Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) [http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml]. The dataset includes TLC trips of the green line in…
0 runs0 likes0 downloads0 reach9 impact
1224158 instances - 18 features - 0 classes - 0 missing values
Predicting US flight delay in December 2017 based on airline on-time performance data provided by the Bureau of Transportation Statistics (BTS) [https://www.transtats.bts.gov/Tables.asp?DB_ID=120].…
0 runs0 likes0 downloads0 reach9 impact
457892 instances - 12 features - 0 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
0 runs0 likes0 downloads0 reach10 impact
595212 instances - 58 features - 2 classes - 846458 missing values
Kaggle dataset containing a list of video games with sales greater than 100,000 copies [https://www.kaggle.com/gregorut/videogamesales#vgsales.csv], which was generated by a scrape of vgchartz.com…
0 runs0 likes0 downloads0 reach11 impact
16598 instances - 9 features - 12 classes - 329 missing values
HPC Job Scheduling Data as included in the R-package 'AppliedPredictiveModeling' [Max Kuhn and Kjell Johnson (2018). AppliedPredictiveModeling: Functions and Data Sets for 'Applied Predictive…
0 runs0 likes0 downloads0 reach11 impact
4331 instances - 8 features - 4 classes - 0 missing values
A processed version of the 'Ames Iowa Housing' dataset as provided by the make_ames() function in the R-package 'AmesHousing' [Max Kuhn (2017). AmesHousing: The Ames Iowa Housing Data. R package…
0 runs0 likes0 downloads0 reach9 impact
2930 instances - 81 features - 0 classes - 0 missing values
Historical data on avocado prices and sales volume in multiple US markets. Downloaded from Kaggle [https://www.kaggle.com/neuromusic/avocado-prices/home] on 29.10.2018. The original data stems from…
0 runs0 likes0 downloads0 reach9 impact
18249 instances - 12 features - 0 classes - 0 missing values