ARFF 2 **Author**: [Isabelle Guyon](isabelle@clopinet.com) **Source**: [Agnostic Learning vs. Prior Knowledge Challenge](http://www.agnostic.inf.ethz.ch) **Please cite**: None __Major changes w.r.t. version 1: changed binary features to data type factor.__ Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of 5 different datasets (SYLVA, GINA, NOVA, HIVA, ADA). The purpose of the challenge was to check if the performance of domain-specific feature engineering (prior knowledge) can be met by algorithms that were trained on data without any domain-specific knowledge (agnostic). For the latter, the data was anonymised and preprocessed in a way that makes them uninterpretable. This dataset contains the agnostic (smashed) version of a data set from the Remote Sensing and GIS Program of Colorado State University for the time span June 2005 - September 2006. A Similar, raw and not-agnostic data set is termed __Covertype Dataset__ and can be found in the [UCI Database](https://archive.ics.uci.edu/ml/datasets/covertype). Modified by TunedIT (converted to ARFF format) ### Topic The task of SYLVA is to classify forest cover types. The forest cover type for 30 x 30 meter cells is obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. We brought it back to a two-class classification problem (classifying Ponderosa pine vs. everything else). The “agnostic data” consists in 216 input variables. Each pattern is composed of 4 records: 2 true records matching the target and 2 records picked at random. Thus ½ of the features are distracters. The “prior knowledge data” is identical to the “agnostic data”, except that the distracters are removed and the identity of the features is revealed. ### Description Data type: non-sparse Number of features: 216 Number of examples and check-sums: Pos_ex Neg_ex Tot_ex Check_sum Train 805 12281 13086 238271607.00 Valid 81 1228 1309 23817234.00 This dataset contains samples from both training and validation datasets. ### Source Original owners: Remote Sensing and GIS Program Department of Forest Sciences College of Natural Resources Colorado State University Fort Collins, CO 80523 (contact Jock A. Blackard, jblackard/wo_ftcol@fs.fed.us or Dr. Denis J. Dean, denis@cnr.colostate.edu) Jock A. Blackard USDA Forest Service 3825 E. Mulberry Fort Collins, CO 80524 USA jblackard/wo_ftcol@fs.fed.us 2017-01-05T19:52:37Z public https://www.openml.org/data/download/18154656/php1PI19i sylva_agnostic 2 0 Public sylva_agnostic 2017-01-05T19:52:37Z 0 0 active