Literary Digest's faulty sample

The Literary Digest ran afoul of the requirements of random selection in two important ways. First, the readers of the magazine were not a representative cross-section of the American public. The magazine's mailing list did not include many kinds of voters. This deficiency was then magnified when the magazine supplemented its mailing list with names from auto registrations and telephone directories. It was 1936, the middle of the depression. People out of work, worrying about how they would feed their families might well consider a telephone an unaffordable luxury. Households with an automobile would be still less likely to be suffering the worst the bad economic times had to offer. The Digest sample was heavily biased toward the wealthy.

It was bad enough that the Digest used so unrepresentative a collection of Americans, but it further violated the requirements of random selection by allowing the respondents to be self-selected. Roughly three-quarters of the people receiving the Digest ballots did not bother to return them. How those people compared with those who took the time to mark and mail the ballot can only be guessed. Later studies have indicated that individuals who respond to such requests for their opinion are more highly motivated and interested than those who do not. In other words, the people who returned the ballots to the Literary Digest were anything but representative of the nation as a whole. They were an unusual subset of an atypical sample of the American public.

Gallup was right: the size of the Digest sample would not be enough to compensate for the mathematical laws it violated. A statistical Titanic, the Digest poll crashed headlong into the iceberg of probability theory and quickly sank.

1948 - Truman's loss

It was not all smooth sailing after that, however. In 1948, Gallup and others predicted that President Harry Truman would lose the presidential election. In fact, Gallup got it almost exactly backward. His American Institute of Public Opinion had Dewey winning by 49.5% to 44.5%. The actual popular vote went to Truman 49.6% to 45.1%.

What had happened? Why had the random samples failed? The fault, argued Gallup, was not in the underlying sampling theory, but rather in the decisions made by the poll directors. In the case of the Gallup Poll, the interviewing stopped 10 days before the election. Also, in an election that seemed to excite little voter interest, Gallup assumed that respondents who said they were undecided would not vote.

Those two decisions proved to be the undoing of the Gallup prediction. Subsequent data showed that there was a late swing of support for Truman, and that nearly everyone who was undecided when Gallup stopped polling eventually voted for Truman.

One problem could be solved easily; do not stop polling until the last possible moment. The other difficulty continues to vex pollsters. What should be done with the undecideds? Assuming they will all not vote can be risky, as Gallup found in 1948. Dividing them in the same proportion as the rest of the sample would have made the Gallup percentages further off in 1948. Splitting the undecided evenly between the candidates reduces the chance of compounding a mistake, but may well understate support for one of the candidates.

In any case, the 1948 presidential contest demonstrated quite clearly that drawing a sample improperly is not the only way to generate inaccurate polling results.

