Data

sonar

active
ARFF
Publicly available Visibility: public Uploaded 06-04-2014 by Jan van Rijn

1 likes downloaded by 24 people , 30 total downloads 0 issues 0 downvotes

1 likes downloaded by 24 people , 30 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author:
Source: Unknown -
Please cite:
NAME: Sonar, Mines vs. Rocks
SUMMARY: This is the data set used by Gorman and Sejnowski in their study
of the classification of sonar signals using a neural network [1]. The
task is to train a network to discriminate between sonar signals bounced
off a metal cylinder and those bounced off a roughly cylindrical rock.
SOURCE: The data set was contributed to the benchmark collection by Terry
Sejnowski, now at the Salk Institute and the University of California at
San Deigo. The data set was developed in collaboration with R. Paul
Gorman of Allied-Signal Aerospace Technology Center.
MAINTAINER: Scott E. Fahlman
PROBLEM DESCRIPTION:
The file "sonar.mines" contains 111 patterns obtained by bouncing sonar
signals off a metal cylinder at various angles and under various
conditions. The file "sonar.rocks" contains 97 patterns obtained from
rocks under similar conditions. The transmitted sonar signal is a
frequency-modulated chirp, rising in frequency. The data set contains
signals obtained from a variety of different aspect angles, spanning 90
degrees for the cylinder and 180 degrees for the rock.
Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number
represents the energy within a particular frequency band, integrated over
a certain period of time. The integration aperture for higher frequencies
occur later in time, since these frequencies are transmitted later during
the chirp.
The label associated with each record contains the letter "R" if the object
is a rock and "M" if it is a mine (metal cylinder). The numbers in the
labels are in increasing order of aspect angle, but they do not encode the
angle directly.
METHODOLOGY:
This data set can be used in a number of different ways to test learning
speed, quality of ultimate learning, ability to generalize, or combinations
of these factors.
In [1], Gorman and Sejnowski report two series of experiments: an
"aspect-angle independent" series, in which the whole data set is used
without controlling for aspect angle, and an "aspect-angle dependent"
series in which the training and testing sets were carefully controlled to
ensure that each set contained cases from each aspect angle in
appropriate proportions.
For the aspect-angle independent experiments the combined set of 208 cases
is divided randomly into 13 disjoint sets with 16 cases in each. For each
experiment, 12 of these sets are used as training data, while the 13th is
reserved for testing. The experiment is repeated 13 times so that every
case appears once as part of a test set. The reported performance is an
average over the entire set of 13 different test sets, each run 10 times.
It was observed that this random division of the sample set led to rather
uneven performance. A few of the splits gave poor results, presumably
because the test set contains some samples from aspect angles that are
under-represented in the corresponding training set. This motivated Gorman
and Sejnowski to devise a different set of experiments in which an attempt
was made to balance the training and test sets so that each would have a
representative number of samples from all aspect angles. Since detailed
aspect angle information was not present in the data base of samples, the
208 samples were first divided into clusters, using a 60-dimensional
Euclidian metric; each of these clusters was then divided between the
104-member training set and the 104-member test set.
The actual training and testing samples used for the "aspect angle
dependent" experiments are marked in the data files. The reported
performance is an average over 10 runs with this single division of the
data set.
A standard back-propagation network was used for all experiments. The
network had 60 inputs and 2 output units, one indicating a cylinder and the
other a rock. Experiments were run with no hidden units (direct
connections from each input to each output) and with a single hidden layer
with 2, 3, 6, 12, or 24 units. Each network was trained by 300 epochs over
the entire training set.
The weight-update formulas used in this study were slightly different from
the standard form. A learning rate of 2.0 and momentum of 0.0 was used.
Errors less than 0.2 were treated as zero. Initial weights were uniform
random values in the range -0.3 to +0.3.
RESULTS:
For the angle independent experiments, Gorman and Sejnowski report the
following results for networks with different numbers of hidden units:
Hidden % Right on Std. % Right on Std.
Units Training set Dev. Test Set Dev.
------ ------------ ---- ---------- ----
0 89.4 2.1 77.1 8.3
2 96.5 0.7 81.9 6.2
3 98.8 0.4 82.0 7.3
6 99.7 0.2 83.5 5.6
12 99.8 0.1 84.7 5.7
24 99.8 0.1 84.5 5.7
For the angle-dependent experiments Gorman and Sejnowski report the
following results:
Hidden % Right on Std. % Right on Std.
Units Training set Dev. Test Set Dev.
------ ------------ ---- ---------- ----
0 79.3 3.4 73.1 4.8
2 96.2 2.2 85.7 6.3
3 98.1 1.5 87.6 3.0
6 99.4 0.9 89.3 2.4
12 99.8 0.6 90.4 1.8
24 100.0 0.0 89.2 1.4
Not surprisingly, the network's performance on the test set was somewhat
better when the aspect angles in the training and test sets were balanced.
Gorman and Sejnowski further report that a nearest neighbor classifier on
the same data gave an 82.7% probability of correct classification.
Three trained human subjects were each tested on 100 signals, chosen at
random from the set of 208 returns used to create this data set. Their
responses ranged between 88% and 97% correct. However, they may have been
using information from the raw sonar signal that is not preserved in the
processed data sets presented here.
REFERENCES:
1. Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units
in a Layered Network Trained to Classify Sonar Targets" in Neural Networks,
Vol. 1, pp. 75-89.
Relabeled values in attribute 'Class'
From: R To: Rock
From: M To: Mine

Class (target) | nominal | 2 unique values 0 missing | |

attribute_1 | numeric | 177 unique values 0 missing | |

attribute_2 | numeric | 182 unique values 0 missing | |

attribute_3 | numeric | 190 unique values 0 missing | |

attribute_4 | numeric | 181 unique values 0 missing | |

attribute_5 | numeric | 193 unique values 0 missing | |

attribute_6 | numeric | 196 unique values 0 missing | |

attribute_7 | numeric | 195 unique values 0 missing | |

attribute_8 | numeric | 201 unique values 0 missing | |

attribute_9 | numeric | 205 unique values 0 missing | |

attribute_10 | numeric | 207 unique values 0 missing | |

attribute_11 | numeric | 203 unique values 0 missing | |

attribute_12 | numeric | 206 unique values 0 missing | |

attribute_13 | numeric | 198 unique values 0 missing | |

attribute_14 | numeric | 202 unique values 0 missing | |

attribute_15 | numeric | 203 unique values 0 missing | |

attribute_16 | numeric | 203 unique values 0 missing | |

attribute_17 | numeric | 202 unique values 0 missing | |

attribute_18 | numeric | 204 unique values 0 missing | |

attribute_19 | numeric | 206 unique values 0 missing | |

attribute_20 | numeric | 203 unique values 0 missing | |

attribute_21 | numeric | 200 unique values 0 missing | |

attribute_22 | numeric | 203 unique values 0 missing | |

attribute_23 | numeric | 199 unique values 0 missing | |

attribute_24 | numeric | 201 unique values 0 missing | |

attribute_25 | numeric | 198 unique values 0 missing | |

attribute_26 | numeric | 194 unique values 0 missing | |

attribute_27 | numeric | 190 unique values 0 missing | |

attribute_28 | numeric | 194 unique values 0 missing | |

attribute_29 | numeric | 197 unique values 0 missing | |

attribute_30 | numeric | 202 unique values 0 missing | |

attribute_31 | numeric | 207 unique values 0 missing | |

attribute_32 | numeric | 205 unique values 0 missing | |

attribute_33 | numeric | 205 unique values 0 missing | |

attribute_34 | numeric | 206 unique values 0 missing | |

attribute_35 | numeric | 205 unique values 0 missing | |

attribute_36 | numeric | 205 unique values 0 missing | |

attribute_37 | numeric | 206 unique values 0 missing | |

attribute_38 | numeric | 206 unique values 0 missing | |

attribute_39 | numeric | 204 unique values 0 missing | |

attribute_40 | numeric | 206 unique values 0 missing | |

attribute_41 | numeric | 204 unique values 0 missing | |

attribute_42 | numeric | 208 unique values 0 missing | |

attribute_43 | numeric | 205 unique values 0 missing | |

attribute_44 | numeric | 196 unique values 0 missing | |

attribute_45 | numeric | 205 unique values 0 missing | |

attribute_46 | numeric | 199 unique values 0 missing | |

attribute_47 | numeric | 202 unique values 0 missing | |

attribute_48 | numeric | 204 unique values 0 missing | |

attribute_49 | numeric | 193 unique values 0 missing | |

attribute_50 | numeric | 154 unique values 0 missing | |

attribute_51 | numeric | 160 unique values 0 missing | |

attribute_52 | numeric | 144 unique values 0 missing | |

attribute_53 | numeric | 134 unique values 0 missing | |

attribute_54 | numeric | 134 unique values 0 missing | |

attribute_55 | numeric | 129 unique values 0 missing | |

attribute_56 | numeric | 122 unique values 0 missing | |

attribute_57 | numeric | 121 unique values 0 missing | |

attribute_58 | numeric | 124 unique values 0 missing | |

attribute_59 | numeric | 119 unique values 0 missing | |

attribute_60 | numeric | 109 unique values 0 missing |

-0.55

First quartile of kurtosis among attributes of the numeric type.

0.24

Third quartile of standard deviation of attributes of the numeric type.

0.65

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.7

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

0.71

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

0.37

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.3

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

0.33

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

Average mutual information between the nominal attributes and the target attribute.

First quartile of mutual information between the nominal attributes and the target attribute.

0.26

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.39

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

0.7

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

0.45

First quartile of skewness among attributes of the numeric type.

0.41

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

0.65

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0

Standard deviation of the number of distinct values among attributes of the nominal type.

2

Average number of distinct values among the attributes of the nominal type.

0.04

First quartile of standard deviation of attributes of the numeric type.

0.71

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

0.37

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.26

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.85

Second quartile (Median) of kurtosis among attributes of the numeric type.

0.41

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

0.26

Second quartile (Median) of means among attributes of the numeric type.

0.71

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

0.65

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

0.34

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

0.94

Second quartile (Median) of skewness among attributes of the numeric type.

0.41

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

0.31

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Minimal mutual information between the nominal attributes and the target attribute.

0.15

Second quartile (Median) of standard deviation of attributes of the numeric type.

0.7

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Maximum mutual information between the nominal attributes and the target attribute.

2

The minimal number of distinct values among attributes of the nominal type.

0.3

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

2

The maximum number of distinct values among attributes of the nominal type.

3.61

Third quartile of kurtosis among attributes of the numeric type.

0.39

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

0.7

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

0.65

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.7

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Third quartile of mutual information between the nominal attributes and the target attribute.

0.37

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.3

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

0.33

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

0.79

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

1.69

Third quartile of skewness among attributes of the numeric type.

0.26

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.39

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

0.7

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001