This book provides a solution to the ecological inference problem, which has plagued users of statistical methods for over seventy-five years: How can researchers reliably infer individual-level behavior from aggregate (ecological) data? In political science, this question arises when individual-level surveys are unavailable (for instance, local or comparative electoral politics), unreliable (racial politics), insufficient (political geography), or infeasible (political history). This ecological inference problem also confronts researchers in numerous areas of major significance in public policy, and other academic disciplines, ranging from epidemiology and marketing to sociology and quantitative history. Although many have attempted to make such cross-level inferences, scholars agree that all existing methods yield very inaccurate conclusions about the world. In this volume, Gary King lays out a unique--and reliable--solution to this venerable problem.
King begins with a qualitative overview, readable even by those without a statistical background. He then unifies the apparently diverse findings in the methodological literature, so that only one aggregation problem remains to be solved. He then presents his solution, as well as empirical evaluations of the solution that include over 16,000 comparisons of his estimates from real aggregate data to the known individual-level answer. The method works in practice.
King's solution to the ecological inference problem will enable empirical researchers to investigate substantive questions that have heretofore proved unanswerable, and move forward fields of inquiry in which progress has been stifled by this problem.
"synopsis" may belong to another edition of this title.
Gary King is Professor of Government at Harvard University. He has authored and coauthored numerous journal articles and books in the field of political methodology, including Designing Social Inquiry: Scientific Inference in Qualitative Research (Princeton).
"This is a significant contribution to political methodology, and to statistical methodology throughout the social sciences. As always with Gary King's work, it is written with great flair and sophistication. This book will generate a good deal of excitement at the methodological frontier, and will also have a bracing impact on substantive research in a variety of fields."--Larry M. Bartels, Princeton University
"In this work, Gary King presents a number of new and important contributions to the field of statistical theory, and the practice of estimating choice probabilities from data aggregated into groups. An impressive statistical contribution."--Melvin J. Hinich, University of Texas-Austin
List of Figures............................................................ | xi |
List of Tables............................................................. | xiii |
Preface.................................................................... | xv |
Part I: Introduction....................................................... | l |
1 Qualitative Overview..................................................... | 3 |
2 Formal Statement of the Problem.......................................... | 28 |
Part II: Catalog of Problems to Fix........................................ | 35 |
3 Aggregation Problems..................................................... | 37 |
4 Non-Aggregation Problems................................................. | 56 |
Part III: The Proposed Solution............................................ | 75 |
5 The Data: Generalizing the Method of Bounds.............................. | 77 |
6 The Model................................................................ | 91 |
7 Preliminary Estimation................................................... | 123 |
8 Calculating Quantities of Interest....................................... | 141 |
9 Model Extensions......................................................... | 158 |
Part IV: Verification...................................................... | 197 |
10 A Typical Application Described in Detail: Voter Registration by Race... | 199 |
11 Robustness to Aggregation Bias: Poverty Status by Sex................... | 217 |
12 Estimation without Information: Black Registration in Kentucky.......... | 226 |
13 Classic Ecological Inferences........................................... | 235 |
Part V: Generalizations and Concluding Suggestions......................... | 247 |
14 Non-Ecological Aggregation Problems..................................... | 249 |
15 Ecological Inference in Larger Tables................................... | 263 |
16 A Concluding Checklist.................................................. | 277 |
Part VI: Appendices........................................................ | 293 |
A Proof That All Discrepancies Are Equivalent.............................. | 295 |
B Parameter Bounds......................................................... | 301 |
C Conditional Posterior Distribution....................................... | 304 |
D The Likelihood Function.................................................. | 307 |
E The Details of Nonparametric Estimation.................................. | 309 |
F Computational Issues..................................................... | 311 |
Glossary of Symbols........................................................ | 313 |
References................................................................. | 317 |
Index...................................................................... | 337 |
Qualitative Overview
Political scientists have understood the ecological inference problemat least since William Ogburn and Inez Goltra (1919) introducedit in the very first multivariate statistical analysis of politics publishedin a political science journal (see Gow, 1985; Bulmer, 1984). In a studyof the voting behavior of newly enfranchised women in Oregon, theywrote that "even though the method of voting makes it impossibleto count women's votes, one wonders if there is not some indirectmethod of solving the problem. The height of a waterfall is not measuredby dropping a line from the top to the bottom, nor is the distancefrom the earth to the sun measured by a rod and chain" (p. 414).
Ogburn and Goltra's "indirect" method of estimating women'svotes was to correlate the percent of women voting in each precinctin Portland, Oregon, with the percent of people voting "no" in selectedreferenda in the same precincts. They reasoned that individualwomen were probably casting ballots against the referenda questionsat a higher rate than men "if precincts with large percentages ofwomen voting, vote in larger percentages against a measure than theprecincts with small percentages of women voting." But they (correctly)worried that what has come to be known as the ecologicalinference problem might invalidate their analysis: "It is also theoreticallypossible to gerrymander the precincts in such a way that theremay be a negative correlative even though men and women each distributetheir votes 50 to 50 on a given measure" (p. 415). The essenceof the ecological inference problem is that the true individual-levelrelationship could even be the reverse of the observed aggregate correlationif it were the men in the heavily female precincts who voteddisproportionately against the referenda.
Ogburn and Goltra's data no longer appear to be available, but theproblem they raised can be illustrated by this simple hypothetical examplereconstructed in part from their verbal descriptions. Considertwo equal-sized precincts voting on Proposition 22, an initiative bythe radical "People's Power League" to institute proportional representationin Oregon's Legislative Assembly elections: 40% of votersin precinct 1 are women and 40% of all voters in this precinct opposethe referenda. In precinct 2, 60% of voters are women and 60%of the precinct opposes the referenda. Precinct 2 has more womenand is more opposed to the referenda than precinct 1, and so it certainlyseems that women are opposing the proportional representationreform. Indeed, it could be the case that all women were opposedand all men voted in favor in both precincts, as might have occuredif the reform were uniformly seen as a way of ensuring men a placein the legislature even though they formed a (slight) minority in everylegislative district. But however intuitive this inference may appear,simple arithmetic indicates that it would be equally consistentwith the observed aggregate data for men to have opposed proportionalrepresentation at a rate four times higher than that of women.These higher relative rates of individual male opposition would occur,given the same aggregate percentages, if a larger fraction of menin the female-dominated precinct 2 opposed the reform than men inprecinct 1, as might happen if precinct 2 was a generally more radicalarea independent of, or even because of, its gender composition.
But if Ogburn and Goltra were Leif Ericson, William Robinson wasChristopher Columbus: for not until Robinson's (1950) article was theproblem widely recognized and the quest for a valid method of makingecological inferences begun in earnest. Robinson's article remainsone of the most influential works in social science methodology. His(correct) view was that, with the methods available at the time, validecological inference was impossible. He warned analysts never to useaggregate data to infer individual relationships, and thus to avoidwhat has since come to be known as "the ecological fallacy." His worksent two shock waves through the social sciences that are still beingfelt, causing some scholarly pursuits to end and another to begin.
First, the use of aggregate data by political scientists, quantitativehistorians, sociologists, and others declined relative to use of otherforms of data; scholars began to avoid using aggregate data to addresswhole classes of important research questions (King, 1990). Inmany countries and fields of study, this "collapse of aggregate dataanalysis ... and its replacement by individual survey analysis as thedominant method of quantitative social research" (Achen and Shively,1995: 5) meant that numerous, often historical and geographical,issues were put aside, and many still remain unanswered. What mighthave become vibrant fields of scholarship withered. The scholars whocontinue to work in these fields—such as those in comparative politicsattempting to explain who voted for the Nazi party, or politicalhistorians studying working-class support for political parties in theantebellum Southern U.S.—do so because of the lack of an alternativeto ecological data, but they toil under a cloud of great suspicion.The ecological inference problem hinders substantive work inalmost every empirical field of political science, as well as numerousareas of sociology, education, marketing, economics, history, geography,epidemiology, and statistics. For example, historical electionstatistics have fallen into disuse and studies based on them into atleast some disrepute. Classic studies, such as V. O. Key's (1949) SouthernPolitics, have been succeeded by scholarship based mostly on surveyresearch, often to great advantage, but necessarily ignoring muchof history, focused as it is on the few recent, mostly national, electionsfor which surveys are available.
The literature's nearly exclusive focus on national surveys with randominterviews of isolated individuals means that the geographiccomponent to social science data is often neglected. Commercial state-levelsurveys are available, but their quality varies considerably andthe results are widely suspect in the academic community. Even if theaddress of each survey respondent were available, the usual 1,000–2,000respondents to national surveys are insufficient for learningmuch about spatial variation except for the grossest geographic patterns,in which a country would be divided into no more than perhapsa dozen broad regions. For example, some National Election Studypolls locate respondents within congressional districts, but only abouta dozen interviews are conducted in any district, and no sample istaken from most of the congressional districts for any one survey. TheGeneral Social Survey makes available no geographic information toresearchers unless they sign a separate confidentiality agreement, andeven then only the respondent's state of residence is released. Surveyorganizations in other countries are even more reticent about releasinglocal geographic information.
Creative combinations of quantitative and qualitative research aremuch more difficult when the identity and rich qualitative informationabout individual communities or respondents cannot be revealedto readers. Indeed, in most cases, respondents' identities are not evenknown to the data analyst. If "all politics is local," political scienceis missing much of politics. In contrast, aggregate data are saturatedwith precise spatial information. For example, the United States canbe divided into approximately 190,000 electoral precincts, and detailedaggregate political data are available for each. Only the ecological inferenceproblem stands between the scientific community and this richsource of information.
Whereas the first shock wave from Robinson's article stifled researchin many substantive fields, the second energized the socialscience statistics community to try to solve the problem. One partialmeasure of the level of effort devoted to solving the ecologicalinference problem is that Robinson's article has been cited more thaneight hundred times. Many other scholars have written on the topicas well, citing those who originally cited Robinson or approaching theproblem from different perspectives. At one extreme, the literature includesauthors such as Bogue and Bogue (1982), who try, unsuccessfully,to "refute" the ecological fallacy altogether; at the other extremeare fatalists who liken the seventy-five year search for a solution tothe ecological inference problem to seeking "alchemists' gold" (Flaniganand Zingale, 1985) or to "a fruitless quest" (Achen and Shively,1995). These scholars, and numerous others between these extremepositions, have written extensively, and often very fruitfully, on thetopic. Successive generations of young scholars and methodologistsin the making, having been warned off aggregate data analysis withtheir teachers' mantra "thou shalt not draw conclusions about individualbehavior from aggregate data," come away with the convictionthat the ecological inference problem presents an enormous barrier tosocial science research. This belief has drawn a steady stream of socialscience methodologists into the search for a solution over the years,myself included.
Numerous important advances have been made in the ecologicalinference literature, but even the best current methods give incorrectanswers a large fraction of the time, and nonsensical answers veryfrequently (such as 115% of blacks voting for the Democrats or -4%of foreign-born Americans being illiterate). No proposed method hasbeen scientifically validated. Any that have been tried on data sets forwhich the individual-level relationship of interest is known generallyfail to give the right answer. It is a testimony to the difficulty of theproblem that no serious attempts have even been made to addressa variety of basic statistical issues related to the problem. For example,currently available measures of uncertainty, such as confidenceintervals, standard errors, and others, have never been validated andappear to be hopelessly inaccurate. Indeed, for some important approaches,no uncertainty measures have even been proposed.
Unlike the rest of this book, this chapter contains no technical detailsand should be readable even by those with little or no statisticalbackground. In the remainder of this chapter, I summarize some otherapplications of ecological inference (Section 1.1), define the problemmore precisely by way of a leading example of the failures of the mostpopular current method (Section 1.2), summarize the nature of the solutionoffered (Section 1.3), provide some brief empirical evidence thatthe method works in practice (Section 1.4), and outline the statisticalmethod offered (Section 1.5).
1.1 The Necessity of Ecological Inferences
Contrary to the pessimistic claims in the ecological inference literature(since Robinson, 1950), aggregate data are sometimes usefuleven without inferences about individuals. Studies of incumbencyadvantage, the political effects of redistricting plans, forecasts ofmacro-economic conditions, and comparisons of infant mortalityrates across nations are just a few of the cases where both questionsand data coincide at the aggregate level. Nevertheless, even studiessuch as these that ask questions about aggregates can usually be improvedwith valid inferences about the individuals who make up theaggregates. And more importantly, numerous other questions existfor which only valid ecological inferences will do.
Fundamental questions in most empirical subfields of political sciencerequire ecological inferences. Researchers in many other fieldsof academic inquiry, as well as the real world of public policy, alsoroutinely try to make inferences about the attributes of individual behaviorfrom aggregate data. If a valid method of making such inferenceswere available, scholars could provide accurate answers to thesequestions with ecological data, and policymakers could base their decisionson reliable scientific techniques. Many of the ecological inferencespursued in these other fields are also of interest to political scientists,which reemphasizes the close historical connection betweenthe ecological inference problem and political science research. Thefollowing list represents a small sample of ecological inferences thathave been attempted in a variety of fields.
• In American public policy, ecological inferences are required to implementkey features of federal law. For example, the U.S. Voting RightsAct of 1965 (and its extensions in 1970, 1975, and 1982) prohibited votingdiscrimination on the basis of race, color, or language. If discriminationis found, the courts or the U.S. Justice Department can order astate or local jurisdiction to redistrict its political boundaries, or to imposeor prevent various other changes in electoral laws. Under presentlaw, legally significant discrimination only exists when plaintiffs (or theJustice Department) can first demonstrate that members of a minoritygroup (usually African American or Hispanic) vote both cohesively anddifferently from other voters. Sometimes they must also prove that majorityvoters consistently prevent minorities from electing a candidate oftheir choice. Since survey data are rarely available in these cases, andbecause they are not often trustworthy in racially polarized contests, anapplication of the Voting Rights Act requires a valid ecological inferencefrom electoral data and U.S. Census data.
Voting Rights Act assessments of minority and majority voting beginswith electoral returns from precincts, the smallest geographic unit forwhich electoral data are available. In addition to the numbers of votesreceived by each candidate in a precinct, census data also gives the fractionof voters in the same precinct who are African American (or otherminority) or white. With these two sets of aggregate data, plaintiffsmust make an ecological inference about how each racial group casts itsballots. That is, since the secret ballot prevents analysts from followingvoters into the voting booth and peering over their shoulders as theycast their ballots, the voting behavior of each racial group must be inferredusing only aggregate electoral and census data. Because of theinadequacy of current methods, in some situations the wrong policiesare being implemented: the wrong districts are being redrawn, and thewrong electoral laws are being changed. (Given the great importanceand practicality of this problem, I will use it as a running example.)
• In one election to the German Reichstag in September 1930, AdolfHitler's previously obscure and electorally insignificant National SocialistGerman Worker's party became the Weimar Republic's second largestpolitical party. The National Socialists continued their stunning electoralsuccesses in subsequent state, local, and presidential elections, andultimately reached 37.3% of the vote in the last election prior to theirtaking power. As so many have asked, how could this have happened?Who voted for the Nazis (and the other extreme groups)? Was the Naziconstituency dominated by the downwardly mobile lower middle classor was support much more widespread? Which religious groups andworker categories supported the National Socialists? Which sectors ofwhich political parties lost votes to the Nazis? The data available to answerthese questions directly include aggregate data from some of the1,200 Kreise (districts) for which both electoral data and various censusdata are available. Because survey data are not available, accurateanswers to these critical questions will only be possible with a validmethod of ecological inference (see Hamilton, 1982; Childers, 1983; andFalter, 1991).
• Epidemiologists and public policy makers need to know whether and towhat extent residential levels of radioactive radon are a risk factor forlung cancer (Stidley and Samet, 1993; Greenland and Robins, 1994a).Radon leaks through basement floors and may pose a significant healthrisk. Legislators in many states are considering bills that would requirehomeowners to test for radon and, if high levels are found, to install oneof several mechanical means of reducing future exposure.
Excerpted from A Solution to the Ecological Inference Problem by Gary King. Copyright © 1997 Princeton University Press. Excerpted by permission of PRINCETON UNIVERSITY PRESS.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
"About this title" may belong to another edition of this title.
FREE shipping within U.S.A.
Destination, rates & speedsSeller: SecondSale, Montgomery, IL, U.S.A.
Condition: Good. Item in good condition. Textbooks may not include supplemental items i.e. CDs, access codes etc. Seller Inventory # 00083079621
Quantity: 1 available
Seller: Wonder Book, Frederick, MD, U.S.A.
Condition: Very Good. Very Good condition. A copy that may have a few cosmetic defects. May also contain light spine creasing or a few markings such as an owner's name, short gifter's inscription or light stamp. Bundled media such as CDs, DVDs, floppy disks or access codes may not be included. Seller Inventory # L12G-00952
Quantity: 1 available
Seller: Better World Books, Mishawaka, IN, U.S.A.
Condition: Very Good. 1st Edition. Former library book; may include library markings. Used book that is in excellent condition. May show signs of wear or have minor defects. Seller Inventory # GRP90748018
Quantity: 1 available
Seller: Better World Books, Mishawaka, IN, U.S.A.
Condition: Very Good. 1st Edition. Used book that is in excellent condition. May show signs of wear or have minor defects. Seller Inventory # 751201-6
Quantity: 1 available
Seller: Better World Books, Mishawaka, IN, U.S.A.
Condition: Good. 1st Edition. Used book that is in clean, average condition without any missing pages. Seller Inventory # 15381277-6
Quantity: 2 available
Seller: ThriftBooks-Atlanta, AUSTELL, GA, U.S.A.
Paperback. Condition: Good. No Jacket. Pages can have notes/highlighting. Spine may show signs of wear. ~ ThriftBooks: Read More, Spend Less 1.26. Seller Inventory # G0691012407I3N00
Quantity: 1 available
Seller: ThriftBooks-Atlanta, AUSTELL, GA, U.S.A.
Paperback. Condition: Good. No Jacket. Former library book; Pages can have notes/highlighting. Spine may show signs of wear. ~ ThriftBooks: Read More, Spend Less 1.26. Seller Inventory # G0691012407I3N10
Quantity: 1 available
Seller: ThriftBooks-Dallas, Dallas, TX, U.S.A.
Paperback. Condition: Very Good. No Jacket. May have limited writing in cover pages. Pages are unmarked. ~ ThriftBooks: Read More, Spend Less 1.26. Seller Inventory # G0691012407I4N00
Quantity: 1 available
Seller: ThriftBooks-Atlanta, AUSTELL, GA, U.S.A.
Paperback. Condition: Very Good. No Jacket. May have limited writing in cover pages. Pages are unmarked. ~ ThriftBooks: Read More, Spend Less 1.26. Seller Inventory # G0691012407I4N00
Quantity: 1 available
Seller: HPB-Red, Dallas, TX, U.S.A.
paperback. Condition: Good. Connecting readers with great books since 1972! Used textbooks may not include companion materials such as access codes, etc. May have some wear or writing/highlighting. We ship orders daily and Customer Service is our top priority! Seller Inventory # S_414760569
Quantity: 1 available