Data
sylva_agnostic

sylva_agnostic

active ARFF Publicly available Visibility: public Uploaded 05-12-2017 by Jann Goschenhofer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: [Isabelle Guyon](isabelle@clopinet.com) Source: [Agnostic Learning vs. Prior Knowledge Challenge](http://www.agnostic.inf.ethz.ch) Please cite: None __Major changes w.r.t. version 1: changed binary features to data type factor.__ Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of 5 different datasets (SYLVA, GINA, NOVA, HIVA, ADA). The purpose of the challenge was to check if the performance of domain-specific feature engineering (prior knowledge) can be met by algorithms that were trained on data without any domain-specific knowledge (agnostic). For the latter, the data was anonymised and preprocessed in a way that makes them uninterpretable. This dataset contains the agnostic (smashed) version of a data set from the Remote Sensing and GIS Program of Colorado State University for the time span June 2005 - September 2006. A Similar, raw and not-agnostic data set is termed __Covertype Dataset__ and can be found in the [UCI Database](https://archive.ics.uci.edu/ml/datasets/covertype). Modified by TunedIT (converted to ARFF format) ### Topic The task of SYLVA is to classify forest cover types. The forest cover type for 30 x 30 meter cells is obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. We brought it back to a two-class classification problem (classifying Ponderosa pine vs. everything else). The “agnostic data” consists in 216 input variables. Each pattern is composed of 4 records: 2 true records matching the target and 2 records picked at random. Thus ½ of the features are distracters. The “prior knowledge data” is identical to the “agnostic data”, except that the distracters are removed and the identity of the features is revealed. ### Description Data type: non-sparse Number of features: 216 Number of examples and check-sums: Pos_ex Neg_ex Tot_ex Check_sum Train 805 12281 13086 238271607.00 Valid 81 1228 1309 23817234.00 This dataset contains samples from both training and validation datasets. ### Source Original owners: Remote Sensing and GIS Program Department of Forest Sciences College of Natural Resources Colorado State University Fort Collins, CO 80523 (contact Jock A. Blackard, jblackard/wo_ftcol@fs.fed.us or Dr. Denis J. Dean, denis@cnr.colostate.edu) Jock A. Blackard USDA Forest Service 3825 E. Mulberry Fort Collins, CO 80524 USA jblackard/wo_ftcol@fs.fed.us

217 features

attr0numeric173 unique values
0 missing
attr1nominal2 unique values
0 missing
attr2nominal2 unique values
0 missing
attr3nominal2 unique values
0 missing
attr4numeric915 unique values
0 missing
attr5nominal2 unique values
0 missing
attr6nominal2 unique values
0 missing
attr7nominal2 unique values
0 missing
attr8nominal2 unique values
0 missing
attr9numeric923 unique values
0 missing
attr10nominal2 unique values
0 missing
attr11numeric354 unique values
0 missing
attr12numeric441 unique values
0 missing
attr13nominal2 unique values
0 missing
attr14nominal1 unique values
0 missing
attr15nominal2 unique values
0 missing
attr16nominal2 unique values
0 missing
attr17nominal2 unique values
0 missing
attr18nominal2 unique values
0 missing
attr19nominal2 unique values
0 missing
attr20numeric353 unique values
0 missing
attr21nominal2 unique values
0 missing
attr22numeric165 unique values
0 missing
attr23numeric167 unique values
0 missing
attr24nominal2 unique values
0 missing
attr25nominal2 unique values
0 missing
attr26nominal2 unique values
0 missing
attr27nominal2 unique values
0 missing
attr28nominal2 unique values
0 missing
attr29nominal2 unique values
0 missing
attr30nominal2 unique values
0 missing
attr31nominal2 unique values
0 missing
attr32nominal2 unique values
0 missing
attr33nominal2 unique values
0 missing
attr34nominal2 unique values
0 missing
attr35nominal2 unique values
0 missing
attr36nominal2 unique values
0 missing
attr37nominal2 unique values
0 missing
attr38nominal2 unique values
0 missing
attr39nominal2 unique values
0 missing
attr40nominal2 unique values
0 missing
attr41nominal2 unique values
0 missing
attr42nominal2 unique values
0 missing
attr43nominal2 unique values
0 missing
attr44nominal2 unique values
0 missing
attr45nominal2 unique values
0 missing
attr46nominal2 unique values
0 missing
attr47nominal2 unique values
0 missing
attr48numeric129 unique values
0 missing
attr49nominal2 unique values
0 missing
attr50nominal2 unique values
0 missing
attr51numeric361 unique values
0 missing
attr52nominal2 unique values
0 missing
attr53numeric909 unique values
0 missing
attr54numeric803 unique values
0 missing
attr55nominal2 unique values
0 missing
attr56nominal2 unique values
0 missing
attr57nominal2 unique values
0 missing
attr58nominal2 unique values
0 missing
attr59numeric361 unique values
0 missing
attr60nominal2 unique values
0 missing
attr61nominal2 unique values
0 missing
attr62numeric245 unique values
0 missing
attr63nominal2 unique values
0 missing
attr64nominal2 unique values
0 missing
attr65nominal2 unique values
0 missing
attr66nominal2 unique values
0 missing
attr67nominal2 unique values
0 missing
attr68nominal2 unique values
0 missing
attr69numeric239 unique values
0 missing
attr70nominal2 unique values
0 missing
attr71nominal2 unique values
0 missing
attr72nominal2 unique values
0 missing
attr73nominal2 unique values
0 missing
attr74numeric131 unique values
0 missing
attr75nominal2 unique values
0 missing
attr76nominal2 unique values
0 missing
attr77nominal2 unique values
0 missing
attr78nominal2 unique values
0 missing
attr79nominal2 unique values
0 missing
attr80nominal2 unique values
0 missing
attr81nominal2 unique values
0 missing
attr82nominal2 unique values
0 missing
attr83nominal2 unique values
0 missing
attr84nominal2 unique values
0 missing
attr85nominal2 unique values
0 missing
attr86nominal2 unique values
0 missing
attr87nominal2 unique values
0 missing
attr88nominal2 unique values
0 missing
attr89nominal2 unique values
0 missing
attr90nominal2 unique values
0 missing
attr91numeric915 unique values
0 missing
attr92nominal1 unique values
0 missing
attr93nominal2 unique values
0 missing
attr94nominal2 unique values
0 missing
attr95nominal2 unique values
0 missing
attr96numeric799 unique values
0 missing
attr97numeric360 unique values
0 missing
attr98nominal2 unique values
0 missing
attr99nominal2 unique values
0 missing
attr100nominal2 unique values
0 missing
attr101numeric427 unique values
0 missing
attr102nominal2 unique values
0 missing
attr103nominal2 unique values
0 missing
attr104nominal2 unique values
0 missing
attr105nominal2 unique values
0 missing
attr106numeric916 unique values
0 missing
attr107nominal2 unique values
0 missing
attr108nominal2 unique values
0 missing
attr109numeric801 unique values
0 missing
attr110nominal2 unique values
0 missing
attr111numeric353 unique values
0 missing
attr112nominal2 unique values
0 missing
attr113nominal2 unique values
0 missing
attr114nominal2 unique values
0 missing
attr115nominal1 unique values
0 missing
attr116nominal2 unique values
0 missing
attr117nominal2 unique values
0 missing
attr118nominal2 unique values
0 missing
attr119nominal2 unique values
0 missing
attr120nominal2 unique values
0 missing
attr121nominal2 unique values
0 missing
attr122nominal2 unique values
0 missing
attr123nominal2 unique values
0 missing
attr124nominal2 unique values
0 missing
attr125nominal2 unique values
0 missing
attr126nominal2 unique values
0 missing
attr127nominal2 unique values
0 missing
attr128nominal2 unique values
0 missing
attr129nominal2 unique values
0 missing
attr130nominal2 unique values
0 missing
attr131nominal2 unique values
0 missing
attr132nominal2 unique values
0 missing
attr133nominal2 unique values
0 missing
attr134nominal2 unique values
0 missing
attr135nominal2 unique values
0 missing
attr136nominal2 unique values
0 missing
attr137nominal2 unique values
0 missing
attr138nominal2 unique values
0 missing
attr139nominal2 unique values
0 missing
attr140nominal2 unique values
0 missing
attr141nominal2 unique values
0 missing
attr142numeric242 unique values
0 missing
attr143nominal2 unique values
0 missing
attr144nominal2 unique values
0 missing
attr145nominal2 unique values
0 missing
attr146numeric340 unique values
0 missing
attr147nominal2 unique values
0 missing
attr148numeric906 unique values
0 missing
attr149nominal2 unique values
0 missing
attr150nominal2 unique values
0 missing
attr151numeric48 unique values
0 missing
attr152numeric49 unique values
0 missing
attr153numeric429 unique values
0 missing
attr154nominal2 unique values
0 missing
attr155nominal2 unique values
0 missing
attr156nominal2 unique values
0 missing
attr157nominal2 unique values
0 missing
attr158nominal2 unique values
0 missing
attr159nominal2 unique values
0 missing
attr160nominal2 unique values
0 missing
attr161numeric50 unique values
0 missing
attr162nominal2 unique values
0 missing
attr163numeric128 unique values
0 missing
attr164numeric51 unique values
0 missing
attr165numeric918 unique values
0 missing
attr166nominal2 unique values
0 missing
attr167numeric170 unique values
0 missing
attr168nominal2 unique values
0 missing
attr169nominal2 unique values
0 missing
attr170nominal2 unique values
0 missing
attr171nominal2 unique values
0 missing
attr172nominal2 unique values
0 missing
attr173nominal2 unique values
0 missing
attr174nominal2 unique values
0 missing
attr175nominal2 unique values
0 missing
attr176numeric360 unique values
0 missing
attr177numeric430 unique values
0 missing
attr178nominal2 unique values
0 missing
attr179numeric919 unique values
0 missing
attr180nominal2 unique values
0 missing
attr181nominal2 unique values
0 missing
attr182nominal2 unique values
0 missing
attr183nominal2 unique values
0 missing
attr184nominal2 unique values
0 missing
attr185nominal2 unique values
0 missing
attr186nominal2 unique values
0 missing
attr187nominal2 unique values
0 missing
attr188nominal2 unique values
0 missing
attr189nominal2 unique values
0 missing
attr190nominal2 unique values
0 missing
attr191nominal2 unique values
0 missing
attr192nominal2 unique values
0 missing
attr193nominal2 unique values
0 missing
attr194numeric133 unique values
0 missing
attr195nominal2 unique values
0 missing
attr196nominal2 unique values
0 missing
attr197nominal2 unique values
0 missing
attr198nominal2 unique values
0 missing
attr199nominal2 unique values
0 missing
attr200nominal2 unique values
0 missing
attr201numeric801 unique values
0 missing
attr202nominal2 unique values
0 missing
attr203numeric244 unique values
0 missing
attr204nominal2 unique values
0 missing
attr205nominal2 unique values
0 missing
attr206nominal2 unique values
0 missing
attr207nominal2 unique values
0 missing
attr208nominal1 unique values
0 missing
attr209nominal2 unique values
0 missing
attr210nominal2 unique values
0 missing
attr211nominal2 unique values
0 missing
attr212nominal2 unique values
0 missing
attr213nominal2 unique values
0 missing
attr214nominal2 unique values
0 missing
attr215nominal2 unique values
0 missing
labelnominal2 unique values
0 missing

62 properties

14395
Number of instances (rows) of the dataset.
217
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
40
Number of numeric attributes.
177
Number of nominal attributes.
560.73
Third quartile of means among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
1
The minimal number of distinct values among attributes of the nominal type.
18.43
Percentage of numeric attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
2
The maximum number of distinct values among attributes of the nominal type.
-1.18
Minimum skewness among attributes of the numeric type.
81.57
Percentage of nominal attributes.
1.12
Third quartile of skewness among attributes of the numeric type.
2.02
Maximum skewness among attributes of the numeric type.
74.54
Minimum standard deviation of attributes of the numeric type.
First quartile of entropy among attributes.
185.32
Third quartile of standard deviation of attributes of the numeric type.
312.04
Maximum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
0.4
First quartile of kurtosis among attributes of the numeric type.
0.15
Standard deviation of the number of distinct values among attributes of the nominal type.
Average entropy of the attributes.
Number of instances belonging to the least frequent class.
276.39
First quartile of means among attributes of the numeric type.
1.26
Mean kurtosis among attributes of the numeric type.
173
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
454.97
Mean of means among attributes of the numeric type.
-0.83
First quartile of skewness among attributes of the numeric type.
Average class difference between consecutive instances.
Average mutual information between the nominal attributes and the target attribute.
105.32
First quartile of standard deviation of attributes of the numeric type.
Entropy of the target attribute values.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
Second quartile (Median) of entropy among attributes.
0.02
Number of attributes divided by the number of instances.
1.98
Average number of distinct values among the attributes of the nominal type.
1.02
Second quartile (Median) of kurtosis among attributes of the numeric type.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
0.28
Mean skewness among attributes of the numeric type.
380.26
Second quartile (Median) of means among attributes of the numeric type.
Percentage of instances belonging to the most frequent class.
152.73
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
0.56
Second quartile (Median) of skewness among attributes of the numeric type.
Maximum entropy among attributes.
-1.24
Minimum kurtosis among attributes of the numeric type.
79.72
Percentage of binary attributes.
144.9
Second quartile (Median) of standard deviation of attributes of the numeric type.
Third quartile of entropy among attributes.
7.24
Maximum kurtosis among attributes of the numeric type.
191.06
Minimum of means among attributes of the numeric type.
0
Percentage of instances having missing values.
1.71
Third quartile of kurtosis among attributes of the numeric type.
879.13
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0
Percentage of missing values.

2 tasks

0 runs - estimation_procedure: 50 times Clustering
Define a new task