{ "data_id": "534", "name": "cps_85_wages", "exact_name": "cps_85_wages", "version": 1, "version_label": null, "description": "**Author**: \n**Source**: Unknown - Date unknown \n**Please cite**: \n\nDeterminants of Wages from the 1985 Current Population Survey\n\nSummary:\nThe Current Population Survey (CPS) is used to supplement census information between census years. These data consist of a random sample of 534 persons from the CPS, with information on wages and other characteristics of the workers, including sex, number of years of education, years of work experience, occupational status, region of residence and union membership. We wish to determine (i) whether wages are related to these characteristics and (ii) whether there is a gender gap in wages.\nBased on residual plots, wages were log-transformed to stabilize the variance. Age and work experience were almost perfectly correlated (r=.98). Multiple regression of log wages against sex, age, years of education, work experience, union membership, southern residence, and occupational status showed that these covariates were related to wages (pooled F test, p < .0001). The effect of age was not significant after controlling for experience. Standardized residual plots showed no patterns, except for one large outlier with lower wages than expected. This was a male, with 22 years of experience and 12 years of education, in a management position, who lived in the north and was not a union member. Removing this person from the analysis did not substantially change the results, so that the final model included the entire sample.\nAdjusting for all other variables in the model, females earned 81% (75%, 88%) the wages of males (p < .0001). Wages increased 41% (28%, 56%) for every 5 additional years of education (p < .0001). They increased by 11% (7%, 14%) for every additional 10 years of experience (p < .0001). Union members were paid 23% (12%, 36%) more than non-union members (p < .0001). Northerns were paid 11% (2%, 20%) more than southerns (p =.016). Management and professional positions were paid most, and service and clerical positions were paid least (pooled F-test, p < .0001). Overall variance explained was R2 = .35.\nIn summary, many factors describe the variations in wages: occupational status, years of experience, years of education, sex, union membership and region of residence. However, despite adjustment for all factors that were available, there still appeared to be a gender gap in wages. There is no readily available explanation for this gender gap.\n\nAuthorization: Public Domain\n\nReference: Berndt, ER. The Practice of Econometrics. 1991. NY: Addison-Wesley.\n\nDescription: The datafile contains 534 observations on 11 variables sampled from the Current Population Survey of 1985. This data set demonstrates multiple regression, confounding, transformations, multicollinearity, categorical variables, ANOVA, pooled tests of significance, interactions and model building strategies.\n\nVariable names in order from left to right:\nEDUCATION: Number of years of education.\nSOUTH: Indicator variable for Southern Region (1=Person lives in \t\tSouth, 0=Person lives elsewhere).\nSEX: Indicator variable for sex (1=Female, 0=Male).\nEXPERIENCE: Number of years of work experience.\nUNION: Indicator variable for union membership (1=Union member, \t\t0=Not union member).\nWAGE: Wage (dollars per hour).\nAGE: Age (years).\nRACE: Race (1=Other, 2=Hispanic, 3=White).\nOCCUPATION: Occupational category (1=Management, \t\t2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other).\nSECTOR: Sector (0=Other, 1=Manufacturing, 2=Construction).\nMARR: Marital Status (0=Unmarried, 1=Married)\n\n\nTherese Stukel\nDartmouth Hitchcock Medical Center\nOne Medical Center Dr.\nLebanon, NH 03756\ne-mail: stukel@dartmouth.edu\n\n\nInformation about the dataset\nCLASSTYPE: numeric\nCLASSINDEX: none specific", "format": "ARFF", "uploader": "Joaquin Vanschoren", "uploader_id": 2, "visibility": "public", "creator": null, "contributor": null, "date": "2014-09-29 00:08:13", "update_comment": "set target feature", "last_update": "2014-10-07 01:27:56", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/52646\/cps_85_wages.arff", "default_target_attribute": "WAGE", "row_id_attribute": null, "ignore_attribute": null, "runs": 2, "suggest": { "input": [ "cps_85_wages", "Determinants of Wages from the 1985 Current Population Survey Summary: The Current Population Survey (CPS) is used to supplement census information between census years. These data consist of a random sample of 534 persons from the CPS, with information on wages and other characteristics of the workers, including sex, number of years of education, years of work experience, occupational status, region of residence and union membership. We wish to determine (i) whether wages are related to these c " ], "weight": 5 }, "qualities": { "NumberOfInstances": 534, "NumberOfFeatures": 11, "NumberOfClasses": 0, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 4, "NumberOfSymbolicFeatures": 7, "MaxNominalAttDistinctValues": 6, "MinSkewnessOfNumericAtts": -0.20367759542716055, "PercentageOfInstancesWithMissingValues": 0, "Quartile3AttributeEntropy": null, "RandomTreeDepth1ErrRate": null, "EquivalentNumberOfAtts": null, "MaxSkewnessOfNumericAtts": 1.697285500295496, "MinStdDevOfNumericAtts": 2.6153726283543635, "PercentageOfMissingValues": 0, "Quartile3KurtosisOfNumericAtts": 3.9540197303601086, "AutoCorrelation": -3.797185741088182, "RandomTreeDepth1Kappa": null, "J48.00001.AUC": null, "MaxStdDevOfNumericAtts": 12.37971008784808, "MinorityClassPercentage": null, "PercentageOfNumericFeatures": 36.36363636363637, "Quartile3MeansOfNumericAtts": 32.080524344569284, "CfsSubsetEval_DecisionStumpAUC": null, "RandomTreeDepth2AUC": null, "J48.00001.ErrRate": null, "MeanAttributeEntropy": null, "MinorityClassSize": null, "PercentageOfSymbolicFeatures": 63.63636363636363, "Quartile3MutualInformation": null, "CfsSubsetEval_DecisionStumpErrRate": null, "RandomTreeDepth2ErrRate": null, "J48.00001.Kappa": null, "MeanKurtosisOfNumericAtts": 1.2177003137197717, "NaiveBayesAUC": null, "Quartile1AttributeEntropy": null, "Quartile3SkewnessOfNumericAtts": 1.4449036099774744, "CfsSubsetEval_DecisionStumpKappa": null, "RandomTreeDepth2Kappa": null, "J48.0001.AUC": null, "MeanMeansOfNumericAtts": 19.174555243445692, "NaiveBayesErrRate": null, "Quartile1KurtosisOfNumericAtts": -0.5308320595021526, "Quartile3StdDevOfNumericAtts": 12.216425746524969, "CfsSubsetEval_NaiveBayesAUC": null, "RandomTreeDepth3AUC": null, "J48.0001.ErrRate": null, "MeanMutualInformation": null, "NaiveBayesKappa": null, "Quartile1MeansOfNumericAtts": 10.022729400749064, "REPTreeDepth1AUC": null, "CfsSubsetEval_NaiveBayesErrRate": null, "RandomTreeDepth3ErrRate": null, "J48.0001.Kappa": null, "MeanNoiseToSignalRatio": null, "NumberOfBinaryFeatures": 4, "Quartile1MutualInformation": null, "REPTreeDepth1ErrRate": null, "CfsSubsetEval_NaiveBayesKappa": null, "RandomTreeDepth3Kappa": null, "J48.001.AUC": null, "MeanNominalAttDistinctValues": 2.857142857142857, "Quartile1SkewnessOfNumericAtts": -0.015683914139776517, "REPTreeDepth1Kappa": null, "CfsSubsetEval_kNN1NAUC": null, "StdvNominalAttDistinctValues": 1.4638501094227998, "J48.001.ErrRate": null, "MeanSkewnessOfNumericAtts": 0.6824157434035301, "Quartile1StdDevOfNumericAtts": 3.246303684800943, "REPTreeDepth2AUC": null, "CfsSubsetEval_kNN1NErrRate": null, "kNN1NAUC": null, "J48.001.Kappa": null, "MeanStdDevOfNumericAtts": 7.965188073224689, "Quartile2AttributeEntropy": null, "REPTreeDepth2ErrRate": null, "CfsSubsetEval_kNN1NKappa": null, "kNN1NErrRate": null, "MajorityClassPercentage": null, "MinAttributeEntropy": null, "Quartile2KurtosisOfNumericAtts": 0.22991327030135889, "REPTreeDepth2Kappa": null, "ClassEntropy": null, "kNN1NKappa": null, "MajorityClassSize": null, "MinKurtosisOfNumericAtts": -0.5807932623072052, "Quartile2MeansOfNumericAtts": 15.420411985018728, "REPTreeDepth3AUC": null, "DecisionStumpAUC": null, "MaxAttributeEntropy": null, "MinMeansOfNumericAtts": 9.024063670411985, "Quartile2MutualInformation": null, "REPTreeDepth3ErrRate": null, "DecisionStumpErrRate": null, "MaxKurtosisOfNumericAtts": 4.991767976583574, "MaxMeansOfNumericAtts": 36.83333333333333, "MinMutualInformation": null, "Quartile2SkewnessOfNumericAtts": 0.6180275343728923, "REPTreeDepth3Kappa": null, "DecisionStumpKappa": null, "MaxMutualInformation": null, "MinNominalAttDistinctValues": 2, "PercentageOfBinaryFeatures": 36.36363636363637, "Quartile2StdDevOfNumericAtts": 8.432834788348156, "RandomTreeDepth1AUC": null, "Dimensionality": 0.020599250936329586 }, "tags": [ { "tag": "OpenML-Reg19", "uploader": "5243" } ], "features": [ { "name": "WAGE", "index": "5", "type": "numeric", "distinct": "238", "missing": "0", "target": "1", "min": "1", "max": "45", "mean": "9", "stdev": "5" }, { "name": "EDUCATION", "index": "0", "type": "numeric", "distinct": "17", "missing": "0", "min": "2", "max": "18", "mean": "13", "stdev": "3" }, { "name": "SOUTH", "index": "1", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "SEX", "index": "2", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "EXPERIENCE", "index": "3", "type": "numeric", "distinct": "52", "missing": "0", "min": "0", "max": "55", "mean": "18", "stdev": "12" }, { "name": "UNION", "index": "4", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "AGE", "index": "6", "type": "numeric", "distinct": "47", "missing": "0", "min": "18", "max": "64", "mean": "37", "stdev": "12" }, { "name": "RACE", "index": "7", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "OCCUPATION", "index": "8", "type": "nominal", "distinct": "6", "missing": "0", "distr": [] }, { "name": "SECTOR", "index": "9", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "MARR", "index": "10", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 3, "total_downloads": 3, "reach": 3, "reuse": 5, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 5 }