Data

satellite_image

active
ARFF
Publicly available Visibility: public Uploaded 17-08-2014 by Tobias Kuehn

1 likes downloaded by 6 people , 9 total downloads 0 issues 0 downvotes

1 likes downloaded by 6 people , 9 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author:
Source: Unknown - 1993
Please cite:
Source:
Ashwin Srinivasan
Department of Statistics and Data Modeling
University of Strathclyde
Glasgow
Scotland
UK
ross '@' uk.ac.turing
The original Landsat data for this database was generated from data purchased from NASA by the Australian Centre for Remote Sensing, and used for research at:
The Centre for Remote Sensing
University of New South Wales
Kensington, PO Box 1
NSW 2033
Australia.
The sample database was generated taking a small section (82 rows and 100 columns) from the original data. The binary values were converted to their present ASCII form by Ashwin Srinivasan. The classification for each pixel was performed on the basis of an actual site visit by Ms. Karen Hall, when working for Professor John A. Richards, at the Centre for Remote Sensing at the University of New South Wales, Australia. Conversion to 3x3 neighbourhoods and splitting into test and training sets was done by Alistair Sutherland.
Data Set Information:
The database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood. The aim is to predict this classification, given the multi-spectral values. In the sample database, the class of a pixel is coded as a number. The Landsat satellite data is one of the many sources of information available for a scene. The interpretation of a scene by integrating spatial data of diverse types and resolutions including multispectral and radar data, maps indicating topography, land use etc. is expected to assume significant importance with the onset of an era characterised by integrative approaches to remote sensing (for example, NASA's Earth Observing System commencing this decade). Existing statistical methods are ill-equipped for handling such diverse data types. Note that this is not true for Landsat MSS data considered in isolation (as in this sample database). This data satisfies the important requirements of being numerical and at a single resolution, and standard maximum-likelihood classification performs very well. Consequently, for this data, it should be interesting to compare the performance of other methods against the statistical approach. One frame of Landsat MSS imagery consists of four digital images of the same scene in different spectral bands. Two of these are in the visible region (corresponding approximately to green and red regions of the visible spectrum) and two are in the (near) infra-red. Each pixel is a 8-bit binary word, with 0 corresponding to black and 255 to white. The spatial resolution of a pixel is about 80m x 80m. Each image contains 2340 x 3380 such pixels. The database is a (tiny) sub-area of a scene, consisting of 82 x 100 pixels. Each line of data corresponds to a 3x3 square neighbourhood of pixels completely contained within the 82x100 sub-area. Each line contains the pixel values in the four spectral bands (converted to ASCII) of each of the 9 pixels in the 3x3 neighbourhood and a number indicating the classification label of the central pixel. The number is a code for the following classes:
Number Class
1 red soil
2 cotton crop
3 grey soil
4 damp grey soil
5 soil with vegetation stubble
6 mixture class (all types present)
7 very damp grey soil
NB. There are no examples with class 6 in this dataset.
The data is given in random order and certain lines of data have been removed so you cannot reconstruct the original image from this dataset. In each line of data the four spectral values for the top-left pixel are given first followed by the four spectral values for the top-middle pixel and then those for the top-right pixel, and so on with the pixels read out in sequence left-to-right and top-to-bottom. Thus, the four spectral values for the central pixel are given by attributes 17,18,19 and 20. If you like you can use only these four attributes, while ignoring the others. This avoids the problem which arises when a 3x3 neighbourhood straddles a boundary.
Attribute Information:
The attributes are numerical, in the range 0 to 255.
UCI: http://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite)

class (target) | numeric | 6 unique values 0 missing | |

attr1 | numeric | 51 unique values 0 missing | |

attr2 | numeric | 84 unique values 0 missing | |

attr3 | numeric | 76 unique values 0 missing | |

attr4 | numeric | 102 unique values 0 missing | |

attr5 | numeric | 51 unique values 0 missing | |

attr6 | numeric | 82 unique values 0 missing | |

attr7 | numeric | 76 unique values 0 missing | |

attr8 | numeric | 103 unique values 0 missing | |

attr9 | numeric | 50 unique values 0 missing | |

attr10 | numeric | 81 unique values 0 missing | |

attr11 | numeric | 78 unique values 0 missing | |

attr12 | numeric | 104 unique values 0 missing | |

attr13 | numeric | 51 unique values 0 missing | |

attr14 | numeric | 83 unique values 0 missing | |

attr15 | numeric | 78 unique values 0 missing | |

attr16 | numeric | 101 unique values 0 missing | |

attr17 | numeric | 50 unique values 0 missing | |

attr18 | numeric | 80 unique values 0 missing | |

attr19 | numeric | 77 unique values 0 missing | |

attr20 | numeric | 104 unique values 0 missing | |

attr21 | numeric | 50 unique values 0 missing | |

attr22 | numeric | 80 unique values 0 missing | |

attr23 | numeric | 78 unique values 0 missing | |

attr24 | numeric | 104 unique values 0 missing | |

attr25 | numeric | 51 unique values 0 missing | |

attr26 | numeric | 82 unique values 0 missing | |

attr27 | numeric | 75 unique values 0 missing | |

attr28 | numeric | 102 unique values 0 missing | |

attr29 | numeric | 50 unique values 0 missing | |

attr30 | numeric | 81 unique values 0 missing | |

attr31 | numeric | 77 unique values 0 missing | |

attr32 | numeric | 103 unique values 0 missing | |

attr33 | numeric | 50 unique values 0 missing | |

attr34 | numeric | 80 unique values 0 missing | |

attr35 | numeric | 77 unique values 0 missing | |

attr36 | numeric | 104 unique values 0 missing |

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

13.6

First quartile of standard deviation of attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

-0.67

Second quartile (Median) of kurtosis among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

82.66

Second quartile (Median) of means among attributes of the numeric type.

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Minimal mutual information between the nominal attributes and the target attribute.

0.02

Second quartile (Median) of skewness among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Maximum mutual information between the nominal attributes and the target attribute.

The minimal number of distinct values among attributes of the nominal type.

16.73

Second quartile (Median) of standard deviation of attributes of the numeric type.

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

The maximum number of distinct values among attributes of the nominal type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Third quartile of mutual information between the nominal attributes and the target attribute.

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

0.62

Third quartile of skewness among attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

-0.88

First quartile of kurtosis among attributes of the numeric type.

20.94

Third quartile of standard deviation of attributes of the numeric type.

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

Average mutual information between the nominal attributes and the target attribute.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

First quartile of mutual information between the nominal attributes and the target attribute.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Standard deviation of the number of distinct values among attributes of the nominal type.

Average number of distinct values among the attributes of the nominal type.

-0.39

First quartile of skewness among attributes of the numeric type.