Data

MercuryinBass

active
ARFF
Publicly available Visibility: public Uploaded 07-10-2014 by Joaquin Vanschoren

0 likes downloaded by 1 people , 2 total downloads 0 issues 0 downvotes

0 likes downloaded by 1 people , 2 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author:
Source: Unknown - Date unknown
Please cite:
Datasets of Data And Story Library, project illustrating use of basic statistic methods, converted to arff format by Hakan Kjellerstrand.
Source: TunedIT: http://tunedit.org/repo/DASL
DASL file http://lib.stat.cmu.edu/DASL/Datafiles/MercuryinBass.html
Mercury Contamination in Bass
Reference: Lange, Royals, & Connor. (1993). Transactions of the American Fisheries Society .
Authorization: contact authors
Description: Largemouth bass were studied in 53 different Florida lakes to examine the factors that influence the level of mercury contamination. Water samples were collected from the surface of the middle of each lake in August 1990 and then again in March 1991. The pH level, the amount of chlorophyll, calcium, and alkalinity were measured in each sample. The average of the August and March values were used in the analysis. Next, a sample of fish was taken from each lake with sample sizes ranging from 4 to 44 fish. The age of each fish and mercury concentration in the muscle tissue was measured. (Note: Since fish absorb mercury over time, older fish will tend to have higher concentrations). Thus, to make a fair comparison of the fish in different lakes, the investigators used a regression estimate of the expected mercury concentration in a three year old fish as the standardized value for each lake. Finally, in 10 of the 53 lakes, the age of the individual fish could not be determined and the average mercury concentration ofthe sampled fish was used instead of the standardized value.
Number of cases: 53
Variable Names:
ID: ID number
Lake: Name of the lake
Alkalinity: Alkalinity (mg/L as Calcium Carbonate)
pH: pH
Calcium: Calcium (mg/l)
Chlorophyll: Chlorophyll (mg/l)
Avg_Mercury: Average mercury concentration (parts per million) in the muscle tissue of the fish sampled from that lake
No.samples: How many fish were sampled from the lake
min: Minimum mercury concentration amongst the sampled fish
max: Maximum mercury concentration amongst the sampled fish
3_yr_Standard_mercury : Regression estimate of the mercury concentration in a 3 year old fish from the lake (or = Avg Mercury when age data was not available)
age_data: Indicator of the availability of age data on fish sampled

3_yr_Standard_Mercury (target) | numeric | 38 unique values 0 missing | |

ID (ignore) | numeric | 53 unique values 0 missing | |

Lake (ignore) | nominal | 53 unique values 0 missing | |

Alkalinity | numeric | 51 unique values 0 missing | |

pH | numeric | 34 unique values 0 missing | |

Calcium | numeric | 48 unique values 0 missing | |

Chlorophyll | numeric | 43 unique values 0 missing | |

Avg_Mercury | numeric | 41 unique values 0 missing | |

No.samples | numeric | 15 unique values 0 missing | |

min | numeric | 32 unique values 0 missing | |

max | numeric | 44 unique values 0 missing | |

age_data | numeric | 2 unique values 0 missing |

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.71

Second quartile (Median) of kurtosis among attributes of the numeric type.

3.73

Second quartile (Median) of means among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

1.01

Second quartile (Median) of skewness among attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

Minimal mutual information between the nominal attributes and the target attribute.

0.91

Second quartile (Median) of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Maximum mutual information between the nominal attributes and the target attribute.

The minimal number of distinct values among attributes of the nominal type.

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

The maximum number of distinct values among attributes of the nominal type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Third quartile of mutual information between the nominal attributes and the target attribute.

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

1.68

Third quartile of skewness among attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

-0.47

First quartile of kurtosis among attributes of the numeric type.

26.4

Third quartile of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

Average mutual information between the nominal attributes and the target attribute.

First quartile of mutual information between the nominal attributes and the target attribute.

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Standard deviation of the number of distinct values among attributes of the nominal type.

Average number of distinct values among the attributes of the nominal type.

0.34

First quartile of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W