Data

lowbwt

active
ARFF
Publicly available Visibility: public Uploaded 23-04-2014 by Jan van Rijn

0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue | #Downvotes for this reason | By |
---|

Loading wiki

Help us complete this description
Edit

Author:
Source: Unknown -
Please cite:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Identification code deleted.
As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction
using instance-based learning with encoding length selection. In Progress
in Connectionist-Based Information Systems. Singapore: Springer-Verlag.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NAME: LOW BIRTH WEIGHT DATA
KEYWORDS: Logistic Regression
SIZE: 189 observations, 11 variables
NOTE:
These data come from Appendix 1 of Hosmer and Lemeshow (1989).
These data are copyrighted and must be acknowledged and used accordingly.
DESCRIPTIVE ABSTRACT:
The goal of this study was to identify risk factors associated with
giving birth to a low birth weight baby (weighing less than 2500 grams).
Data were collected on 189 women, 59 of which had low birth weight babies
and 130 of which had normal birth weight babies. Four variables which were
thought to be of importance were age, weight of the subject at her last
menstrual period, race, and the number of physician visits during the first
trimester of pregnancy.
SOURCE:
Data were collected at Baystate Medical Center, Springfield,
Massachusetts, during 1986.
NOTE:
This data set consists of the complete data. A paired data set
created from this low birth weight data may be found in plowbwt.dat and
a 3 to 1 matched data set created from the low birth weight data may be
found in mlowbwt.dat.
Table: Code Sheet for the Variables in the Low Birth Weight Data Set.
Columns Variable Abbreviation
-----------------------------------------------------------------------------
2-4 Identification Code ID
10 Low Birth Weight (0 = Birth Weight ge 2500g, LOW
l = Birth Weight < 2500g)
17-18 Age of the Mother in Years AGE
23-25 Weight in Pounds at the Last Menstrual Period LWT
32 Race (1 = White, 2 = Black, 3 = Other) RACE
40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE
48 History of Premature Labor (0 = None, 1 = One, etc.) PTL
55 History of Hypertension (1 = Yes, 0 = No) HT
61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI
67 Number of Physician Visits During the First Trimester FTV
(0 = None, 1 = One, 2 = Two, etc.)
73-76 Birth Weight in Grams BWT
-----------------------------------------------------------------------------
PEDAGOGICAL NOTES:
These data have been used as an example of fitting a multiple
logistic regression model.
STORY BEHIND THE DATA:
Low birth weight is an outcome that has been of concern to physicians
for years. This is due to the fact that infant mortality rates and birth
defect rates are very high for low birth weight babies. A woman's behavior
during pregnancy (including diet, smoking habits, and receiving prenatal care)
can greatly alter the chances of carrying the baby to term and, consequently,
of delivering a baby of normal birth weight.
The variables identified in the code sheet given in the table have been
shown to be associated with low birth weight in the obstetrical literature. The
goal of the current study was to ascertain if these variables were important
in the population being served by the medical center where the data were
collected.
References:
1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).

class (target) | numeric | 133 unique values 0 missing | |

LOW | nominal | 2 unique values 0 missing | |

AGE | numeric | 24 unique values 0 missing | |

LWT | numeric | 75 unique values 0 missing | |

RACE | nominal | 3 unique values 0 missing | |

SMOKE | nominal | 2 unique values 0 missing | |

PTL | nominal | 4 unique values 0 missing | |

HT | nominal | 2 unique values 0 missing | |

UI | nominal | 2 unique values 0 missing | |

FTV | nominal | 6 unique values 0 missing |

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

First quartile of mutual information between the nominal attributes and the target attribute.

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

3

Average number of distinct values among the attributes of the nominal type.

-0.21

First quartile of skewness among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

1.53

Standard deviation of the number of distinct values among attributes of the nominal type.

5.3

First quartile of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

0.62

Second quartile (Median) of kurtosis among attributes of the numeric type.

129.81

Second quartile (Median) of means among attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Minimal mutual information between the nominal attributes and the target attribute.

0.72

Second quartile (Median) of skewness among attributes of the numeric type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

2

The minimal number of distinct values among attributes of the nominal type.

30.58

Second quartile (Median) of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Maximum mutual information between the nominal attributes and the target attribute.

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

6

The maximum number of distinct values among attributes of the nominal type.

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Third quartile of mutual information between the nominal attributes and the target attribute.

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

-0.08

First quartile of kurtosis among attributes of the numeric type.

729.02

Third quartile of standard deviation of attributes of the numeric type.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

Average mutual information between the nominal attributes and the target attribute.

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3