% Datasets for `Pattern Recognition and Neural Networks' by B.D. Ripley % ===================================================================== % % Cambridge University Press (1996) ISBN 0-521-46086-7 % % The background to the datasets is described in section 1.4; this file % relates the computer-readable files to that description. % % % % Cushing's syndrome % ------------------ % % Data from Aitchison & Dunsmore (1975, Tables 11.1-3). % % Data file Cushings.dat has four columns, % % Label of the patient % Tetrhydrocortisone (mg/24hr) % Pregnanetriol (mg/24hr) % Type % % The type of the last six patients (u1 to u6) should be % regarded as unknown. (The code `o' indicates `other'). % % % % synthetic two-class problem % --------------------------- % % Data from Ripley (1994a). % % This has two real-valued co-ordinates (xs and ys) and a class (xc) % which is 0 or 1. % % Data file synth.tr has 250 rows of the training set % synth.te has 1000 rows of the test set (not used here) % % % % viruses % ------- % % This is a dataset on 61 viruses with rod-shaped particles affecting % various crops (tobacco, tomato, cucumber and others) described by % {Fauquet et al. (1988) and analysed by Eslava-G\'omez (1989). There % are 18 measurements on each virus, the number of amino acid residues % per molecule of coat protein. % % Data file viruses.dat has 61 rows of 18 counts % virus3.dat has 38 rows corresponding to the distinct % Tobamoviruses. % % The whole dataset is in order Hordeviruses (3), Tobraviruses (6), % Tobamoviruses (39) and `furoviruses' (13). % % % % Leptograpsus crabs % ------------------ % % Data from Campbell & Mahon (1974) on the morphology of rock crabs of % genus Leptograpsus. % % There are 50 specimens of each sex of each of two colour forms. % % Data file crabs.dat has rows % % sp `species', coded B (blue form) or O (orange form) % sex coded M or F % index within each group of 50 % FL frontal lip of carapace (mm) % RW rear width of carapace (mm) % CL length along the midline of carapace (mm) % CW maximum width of carapace (mm) % BD body depth (mm) % % % % Forensic glass % -------------- % % This example comes from forensic testing of glass collected by % B. German on 214 fragments of glass. It is also contained in the % UCI machine-learning database collection (Murphy & Aha, 1995). % % Data file fglass.dat has 214 rows with data for a single glass % fragment. % % RI refractive index % Na % weight of sodium oxide(s) % Mg % weight of magnesium oxide(s) % Al % weight of aluminium oxide(s) % Si % weight of silicon oxide(s) % K % weight of potassium oxide(s) % Ca % weight of calcium oxide(s) % Ba % weight of barium oxide(s) % Fe % weight of iron oxide(s) % type coded 1 to 7 % % The type codes are: % % 1 (WinF) window float glass % 2 (WinNF) window non-float glass % 3 (Veh) vehicle glass % 5 (Con) containers % 6 (Tabl) tableware % 7 (Head) vehicle headlamp glass % % The ten groups used for the cross-validation experiments (I believe) % are listed as row numbers in the file fglass.grp, % % % % Diabetes in Pima Indians % ------------------------ % % A population of women who were at least 21 years old, of Pima Indian heritage % and living near Phoenix, Arizona, was tested for diabetes % according to World Health Organization criteria. The data % were collected by the US National Institute of Diabetes and Digestive and % Kidney Diseases (Smith et al, 1988). This example is also contained in the % UCI machine-learning database collection (Murphy & Aha, 1995). % % The data files have rows containing % % npreg number of pregnancies % glu plasma glucose concentration in an oral glucose tolerance test % bp diastolic blood pressure (mm Hg) % skin triceps skin fold thickness (mm) % ins serum insulin (micro U/ml) % bmi body mass index (weight in kg/(height in m)^2) % ped diabetes pedigree function % age in years % type No / Yes % % Data file pima.tr has 200 rows of complete training data. % pima.te has 332 rows of complete test data. % pima.tr2 has the 200 rows of pima.tr plus 100 incomplete rows. % % % % Information about the dataset % CLASSTYPE: nominal % CLASSINDEX: last % @relation prnn-cushings @attribute Label {a1,a2,a3,a4,a5,a6,b1,b10,b2,b3,b4,b5,b6,b7,b8,b9,c1,c2,c3,c4,c5,u1,u2,u3,u4,u5,u6} @attribute Tetrahydrocortisone REAL @attribute Pregnanetriol REAL @attribute Type {a,b,c,o} @data a1,3.1,11.70,a a2,3.0,1.30,a a3,1.9,0.10,a a4,3.8,0.04,a a5,4.1,1.10,a a6,1.9,0.40,a b1,8.3,1.00,b b2,3.8,0.20,b b3,3.9,0.60,b b4,7.8,1.20,b b5,9.1,0.60,b b6,15.4,3.60,b b7,7.7,1.60,b b8,6.5,0.40,b b9,5.7,0.40,b b10,13.6,1.60,b c1,10.2,6.40,c c2,9.2,7.90,c c3,9.6,3.10,c c4,53.8,2.50,c c5,15.8,7.60,c u1,5.1,0.4,b u2,12.9,5.0,c u3,13.0,0.8,b u4,2.6,0.1,a u5,30.0,0.1,o u6,20.5,0.8,o