This book provides a solution to the ecological inference problem, which has plagued users of statistical methods for over seventy-five years: How can researchers reliably infer individual-level behavior from aggregate (ecological) data? In political science, this question arises when individual-level surveys are unavailable (for instance, local or comparative electoral politics), unreliable (racial politics), insufficient (political geography), or infeasible (political history). This ecological inference problem also confronts researchers in numerous areas of major significance in public…mehr
This book provides a solution to the ecological inference problem, which has plagued users of statistical methods for over seventy-five years: How can researchers reliably infer individual-level behavior from aggregate (ecological) data? In political science, this question arises when individual-level surveys are unavailable (for instance, local or comparative electoral politics), unreliable (racial politics), insufficient (political geography), or infeasible (political history). This ecological inference problem also confronts researchers in numerous areas of major significance in public policy, and other academic disciplines, ranging from epidemiology and marketing to sociology and quantitative history. Although many have attempted to make such cross-level inferences, scholars agree that all existing methods yield very inaccurate conclusions about the world. In this volume, Gary King lays out a unique--and reliable--solution to this venerable problem. King begins with a qualitative overview, readable even by those without a statistical background. He then unifies the apparently diverse findings in the methodological literature, so that only one aggregation problem remains to be solved. He then presents his solution, as well as empirical evaluations of the solution that include over 16,000 comparisons of his estimates from real aggregate data to the known individual-level answer. The method works in practice. King's solution to the ecological inference problem will enable empirical researchers to investigate substantive questions that have heretofore proved unanswerable, and move forward fields of inquiry in which progress has been stifled by this problem.Hinweis: Dieser Artikel kann nur an eine deutsche Lieferadresse ausgeliefert werden.
List of Figures xi List of Tables xiii Preface xv PART I: INTRODUCTION 1 1. Qualitative Overview 3 1.1 The Necessity of Ecological Inferences 7 1.2 The Problem 12 1.3 The Solution 17 1.4 The Evidence 22 1.5 The Method 26 2. Formal Statement of the Problem 28 PART II: CATALOG OF PROBLEMS TO FIX 35 3. Aggregation Problems 37 3.1 Goodman's Regression: A Definition 37 3.2 The Indeterminacy Problem 39 3.3 The Grouping Problem 46 3.4 Equivalence of the Grouping and Indeterminacy Problems 53 3.5 A Concluding Definition 54 4. Non-Aggregation Problems 56 4.1 Goodman Regression Model Problems 56 4.2 Applying Goodman's Regression in 2 x 3 Tables 68 4.3 Double Regression Problems 71 4.4 Concluding Remarks 73 PART III: THE PROPOSED SOLUTION 75 5. The Data: Generalizing the Method of Bounds 77 5.1 Homogeneous Precincts: No Uncertainty 78 5.2 Heterogeneous Precincts: Upper and Lower Bounds 79 5.2.1 Precinct-Level Quantities of Interest 79 5.2.2 District-Level Quantities of Interest 83 5.3 An Easy Visual Method for Computing Bounds 85 6. The Model 91 6.1 The Basic Model 92 6.2 Model Interpretation 94 6.2.1 Observable Implications of Model Parameters 96 6.2.2 Parameterizing the Truncated Bivariate Normal 102 6.2.3 Computing 2p Parameters from Only p Observations 106 6.2.4 Connections to the Statistics of Medical and Seismic Imaging 112 6.2.5 Would a Model of Individual-Level Choices Help? 119 7. Preliminary Estimation 123 7.1 A Visual Introduction 124 7.2 The Likelihood Function 132 7.3 Parameterizations 135 7.4 Optional Priors 138 7.5 Summarizing Information about Estimated Parameters 139 8. Calculating Quantities of Interest 141 8.1 Simulation Is Easier than Analytical Derivation 141 8.1.1 Definitions and Examples 142 8.1.2 Simulation for Ecological Inference 144 8.2 Precinct-Level Quantities 145 8.3 District-Level Quantities 149 8.4 Quantities of Interest from Larger Tables 151 8.4.1 A Multiple Imputation Approach 151 8.4.2 An Approach Related to Double Regression 153 8.5 Other Quantities of Interest 156 9. Model Extensions 158 9.1 What Can Go Wrong? 158 9.1.1 Aggregation Bias 159 9.1.2 Incorrect Distributional Assumptions 161 9.1.3 Spatial Dependence 164 9.2 Avoiding Aggregation Bias 168 9.2.1 Using External Information 169 9.2.2 Unconditional Estimation: Xi as a Covariate 174 9.2.3 Tradeoffs and Priors for the Extended Model 179 9.2.4 Ex Post Diagnostics 183 9.3 Avoiding Distributional Problems 184 9.3.1 Parametric Approaches 185 9.3.2 A Nonparametric Approach 191 PART IV: VERIFICATION 197 10. A Typical Application Described in Detail: Voter Registration by Race 199 10.1 The Data 199 10.2 Likelihood Estimation 200 10.3 Computing Quantities of Interest 207 10.3.1 Aggregate 207 10.3.2 County Level 209 10.3.3 Other Quantities of Interest 215 11. Robustness to Aggregation Bias: Poverty Status by Sex 217 11.1 Data and Notation 217 11.2 Verifying the Existence of Aggregation Bias 218 11.3 Fitting the Data 220 11.4 Empirical Results 222 12. Estimation without Information: Black Registration in Kentucky 226 12.1 The Data 226 12.2 Data Problems 227 12.3 Fitting the Data 228 12.4 Empirical Results 232 13. Classic Ecological Inferences 235 13.1 Voter Transitions 235 13.1.1 Data 235 13.1.2 Estimates 238 13.2 Black Literacy in 1910 241 PART V: GENERALIZATIONS AND CONCLUDING SUGGESTIONS 247 14. Non-Ecological Aggregation Problems 249 14.1 The Geographer's Modifiable Areal Unit Problem 249 14.1.1 The Problem with the Problem 250 14.1.2 Ecological Inference as a Solution to the Modifiable Areal Unit Problem 252 14.2 The Statistical Problem of Combining Survey and Aggregate Data 255 14.3 The Econometric Problem of Aggregating Continuous Variables 258 14.4 Concluding Remarks on Related Aggregation Research 262 15. Ecological Inference in Larger Tables 263 15.1 An Intuitive Approach 264 15.2 Notation for a General Approach 267 15.3 Generalized Bounds 269 15.4 The Statistical Model 271 15.5 Distributional Implications 273 15.6 Calculating the Quantities of Interest 276 15.7 Concluding Suggestions 276 16. A Concluding Checklist 277 PART VI: APPENDICES 293 A. Proof That All Discrepancies Are Equivalent 295 B Parameter Bounds 301 B.1 Homogeneous Precincts 301 B.2 Heterogeneous Precincts 302 B.3 Heterogeneous Precincts 303 C Conditional Posterior Distribution 304 C.1 Using Bayes Theorem 305 C.2 Using Properties of Normal Distributions 306 D The Likelihood Function 307 E The Details of Nonparametric Estimation 309 F Computational Issues 311 Glossary of Symbols 313 References 317 Index 337
List of Figures xi List of Tables xiii Preface xv PART I: INTRODUCTION 1 1. Qualitative Overview 3 1.1 The Necessity of Ecological Inferences 7 1.2 The Problem 12 1.3 The Solution 17 1.4 The Evidence 22 1.5 The Method 26 2. Formal Statement of the Problem 28 PART II: CATALOG OF PROBLEMS TO FIX 35 3. Aggregation Problems 37 3.1 Goodman's Regression: A Definition 37 3.2 The Indeterminacy Problem 39 3.3 The Grouping Problem 46 3.4 Equivalence of the Grouping and Indeterminacy Problems 53 3.5 A Concluding Definition 54 4. Non-Aggregation Problems 56 4.1 Goodman Regression Model Problems 56 4.2 Applying Goodman's Regression in 2 x 3 Tables 68 4.3 Double Regression Problems 71 4.4 Concluding Remarks 73 PART III: THE PROPOSED SOLUTION 75 5. The Data: Generalizing the Method of Bounds 77 5.1 Homogeneous Precincts: No Uncertainty 78 5.2 Heterogeneous Precincts: Upper and Lower Bounds 79 5.2.1 Precinct-Level Quantities of Interest 79 5.2.2 District-Level Quantities of Interest 83 5.3 An Easy Visual Method for Computing Bounds 85 6. The Model 91 6.1 The Basic Model 92 6.2 Model Interpretation 94 6.2.1 Observable Implications of Model Parameters 96 6.2.2 Parameterizing the Truncated Bivariate Normal 102 6.2.3 Computing 2p Parameters from Only p Observations 106 6.2.4 Connections to the Statistics of Medical and Seismic Imaging 112 6.2.5 Would a Model of Individual-Level Choices Help? 119 7. Preliminary Estimation 123 7.1 A Visual Introduction 124 7.2 The Likelihood Function 132 7.3 Parameterizations 135 7.4 Optional Priors 138 7.5 Summarizing Information about Estimated Parameters 139 8. Calculating Quantities of Interest 141 8.1 Simulation Is Easier than Analytical Derivation 141 8.1.1 Definitions and Examples 142 8.1.2 Simulation for Ecological Inference 144 8.2 Precinct-Level Quantities 145 8.3 District-Level Quantities 149 8.4 Quantities of Interest from Larger Tables 151 8.4.1 A Multiple Imputation Approach 151 8.4.2 An Approach Related to Double Regression 153 8.5 Other Quantities of Interest 156 9. Model Extensions 158 9.1 What Can Go Wrong? 158 9.1.1 Aggregation Bias 159 9.1.2 Incorrect Distributional Assumptions 161 9.1.3 Spatial Dependence 164 9.2 Avoiding Aggregation Bias 168 9.2.1 Using External Information 169 9.2.2 Unconditional Estimation: Xi as a Covariate 174 9.2.3 Tradeoffs and Priors for the Extended Model 179 9.2.4 Ex Post Diagnostics 183 9.3 Avoiding Distributional Problems 184 9.3.1 Parametric Approaches 185 9.3.2 A Nonparametric Approach 191 PART IV: VERIFICATION 197 10. A Typical Application Described in Detail: Voter Registration by Race 199 10.1 The Data 199 10.2 Likelihood Estimation 200 10.3 Computing Quantities of Interest 207 10.3.1 Aggregate 207 10.3.2 County Level 209 10.3.3 Other Quantities of Interest 215 11. Robustness to Aggregation Bias: Poverty Status by Sex 217 11.1 Data and Notation 217 11.2 Verifying the Existence of Aggregation Bias 218 11.3 Fitting the Data 220 11.4 Empirical Results 222 12. Estimation without Information: Black Registration in Kentucky 226 12.1 The Data 226 12.2 Data Problems 227 12.3 Fitting the Data 228 12.4 Empirical Results 232 13. Classic Ecological Inferences 235 13.1 Voter Transitions 235 13.1.1 Data 235 13.1.2 Estimates 238 13.2 Black Literacy in 1910 241 PART V: GENERALIZATIONS AND CONCLUDING SUGGESTIONS 247 14. Non-Ecological Aggregation Problems 249 14.1 The Geographer's Modifiable Areal Unit Problem 249 14.1.1 The Problem with the Problem 250 14.1.2 Ecological Inference as a Solution to the Modifiable Areal Unit Problem 252 14.2 The Statistical Problem of Combining Survey and Aggregate Data 255 14.3 The Econometric Problem of Aggregating Continuous Variables 258 14.4 Concluding Remarks on Related Aggregation Research 262 15. Ecological Inference in Larger Tables 263 15.1 An Intuitive Approach 264 15.2 Notation for a General Approach 267 15.3 Generalized Bounds 269 15.4 The Statistical Model 271 15.5 Distributional Implications 273 15.6 Calculating the Quantities of Interest 276 15.7 Concluding Suggestions 276 16. A Concluding Checklist 277 PART VI: APPENDICES 293 A. Proof That All Discrepancies Are Equivalent 295 B Parameter Bounds 301 B.1 Homogeneous Precincts 301 B.2 Heterogeneous Precincts 302 B.3 Heterogeneous Precincts 303 C Conditional Posterior Distribution 304 C.1 Using Bayes Theorem 305 C.2 Using Properties of Normal Distributions 306 D The Likelihood Function 307 E The Details of Nonparametric Estimation 309 F Computational Issues 311 Glossary of Symbols 313 References 317 Index 337
Es gelten unsere Allgemeinen Geschäftsbedingungen: www.buecher.de/agb
Impressum
www.buecher.de ist ein Internetauftritt der buecher.de internetstores GmbH
Geschäftsführung: Monica Sawhney | Roland Kölbl | Günter Hilger
Sitz der Gesellschaft: Batheyer Straße 115 - 117, 58099 Hagen
Postanschrift: Bürgermeister-Wegele-Str. 12, 86167 Augsburg
Amtsgericht Hagen HRB 13257
Steuernummer: 321/5800/1497
USt-IdNr: DE450055826