Data

jura

in_preparation
ARFF
Publicly available Visibility: public Uploaded 22-11-2018 by Quay Au

0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Multivariate regression data set from: https://link.springer.com/article/10.1007%2Fs10994-016-5546-z : The Jura (Goovaerts 1997) dataset consists of measurements of concentrations of seven heavy metals (cadmium, cobalt, chromium, copper, nickel, lead, and zinc), recorded at 359 locations in the topsoil of a region of the Swiss Jura. The type of land use (Forest, Pasture, Meadow, Tillage) and rock type (Argovian, Kimmeridgian, Sequanian, Portlandian, Quaternary) were also recorded for each location. In a typical scenario (Goovaerts 1997; Alvarez and Lawrence 2011), we are interested in the prediction of the concentration of metals that are more expensive to measure (primary variables) using measurements of metals that are cheaper to sample (secondary variables). In this study, cadmium, copper and lead are treated as target variables while the remaining metals along with land use type, rock type and the coordinates of each location are used as predictive features.

Cd (target) | numeric | 276 unique values 0 missing | |

Co (target) | numeric | 219 unique values 0 missing | |

Cu (target) | numeric | 302 unique values 0 missing | |

Xloc | numeric | 341 unique values 0 missing | |

Yloc | numeric | 347 unique values 0 missing | |

Landuse_1 | numeric | 2 unique values 0 missing | |

Landuse_2 | numeric | 2 unique values 0 missing | |

Landuse_3 | numeric | 2 unique values 0 missing | |

Landuse_4 | numeric | 2 unique values 0 missing | |

Rock_1 | numeric | 2 unique values 0 missing | |

Rock_2 | numeric | 2 unique values 0 missing | |

Rock_3 | numeric | 2 unique values 0 missing | |

Rock_4 | numeric | 2 unique values 0 missing | |

Rock_5 | numeric | 2 unique values 0 missing | |

Cr | numeric | 265 unique values 0 missing | |

Ni | numeric | 277 unique values 0 missing | |

Pb | numeric | 254 unique values 0 missing | |

Zn | numeric | 242 unique values 0 missing |

Maximum mutual information between the nominal attributes and the target attribute.

The minimal number of distinct values among attributes of the nominal type.

Third quartile of mutual information between the nominal attributes and the target attribute.

The maximum number of distinct values among attributes of the nominal type.

8.74

Third quartile of standard deviation of attributes of the numeric type.

-0.65

First quartile of kurtosis among attributes of the numeric type.

Standard deviation of the number of distinct values among attributes of the nominal type.

First quartile of mutual information between the nominal attributes and the target attribute.

0.25

First quartile of skewness among attributes of the numeric type.

Average mutual information between the nominal attributes and the target attribute.

0.4

First quartile of standard deviation of attributes of the numeric type.

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

Average number of distinct values among the attributes of the nominal type.

0.27

Second quartile (Median) of kurtosis among attributes of the numeric type.

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

0.95

Second quartile (Median) of means among attributes of the numeric type.

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

1.36

Second quartile (Median) of skewness among attributes of the numeric type.

0.67

Second quartile (Median) of standard deviation of attributes of the numeric type.

6.56

Third quartile of kurtosis among attributes of the numeric type.

Minimal mutual information between the nominal attributes and the target attribute.