Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) []. The dataset includes TLC trips of the green line in December 2016. Data was downloaded on 03.11.2018. For a description of all variables in the dataset checkout the TLC homepage []. The variable 'tip_amount' was chosen as target variable. The variable 'total_amount' is ignored by default, otherwise the target could be predicted deterministically. The date variables 'lpep_pickup_datetime' and 'lpep_dropoff_datetime' (ignored by default) could be used to compute additional time features. In this version, we chose only trips with 'payment_type' == 1 (credit card), as tips are not included for most other payment types. We also removed the variables 'trip_distance' and 'fare_amount' to increase the importance of the categorical features 'PULocationID' and 'DOLocationID'.

15 features

tip_amount (target)numeric1811 unique values
0 missing
VendorIDnominal2 unique values
0 missing
lpep_pickup_datetimestring505885 unique values
0 missing
lpep_dropoff_datetimestring505577 unique values
0 missing
store_and_fwd_flagnominal2 unique values
0 missing
RatecodeIDnominal5 unique values
0 missing
PULocationIDnominal233 unique values
0 missing
DOLocationIDnominal259 unique values
0 missing
passenger_countnumeric10 unique values
0 missing
extranominal5 unique values
0 missing
mta_taxnominal3 unique values
0 missing
tolls_amountnumeric105 unique values
0 missing
improvement_surchargenominal3 unique values
0 missing
total_amountnumeric5377 unique values
0 missing
trip_typenominal2 unique values
0 missing

