In an earlier analysis, I examined how likely it was that Patrick Buchanan had received enough `erroneous' votes in Palm Beach county to affect the outcome of the Presidential election. I reached the conclusion that the number of erroneous ballots was almost certainly greater than the margin by which Bush was leading Gore in the official vote count. In other words, if all of those erroneous ballots had been cast for Gore instead of Buchanan, Gore would have been ahead in the popular vote count.2
I have received a variety of useful feedback on that note, and have obtained new data on Florida county voting patterns in the 1996 general elections and the Republican presidential primary (from Greg Adams's website - see footnote 1) and this note responds to some of that feedback and incorporates the new data into the analysis.3
Several Republican officials have argued that the high Buchanan vote in Palm Beach County might have been legitimate because Palm Beach County was a `Buchanan stronghold.' However, in the 1996 Republican primary where Buchanan was a candidate, he did very badly in Palm Beach County, receiving only 15.3 percent of the vote compared to his statewide average of 25.3 percent;4 Palm Beach county was his sixth-worst showing in that election. Furthermore, Buchanan himself has now disputed the claim that Palm Beach county is a Buchanan stronghold.5
Another possibility is that the Reform party, whose candidate Buchanan was this year, is particularly popular in Palm Beach County. However, Ross Perot, the 1996 Reform Party presidential nominee, gained only 7.8 percent of the vote in the 1996 Presidential election (his seventh-worst showing), compared to a statewide average of 11.8 percent.
Finally, as a measure of whether Palm Beach county is generally more conservative than the rest of Florida, we can examine the vote split between Bill Clinton and Bob Dole in 1996. In the typical county in Florida, Dole garnered 44.7 percent of the popular vote in the 1996 election. In Palm Beach County, he received only 33.9 percent of the vote. President Clinton received 54.8 percent of the Palm Beach vote, compared to a statewide average of only 43.4 percent. And the 2000 Republican Senatorial candidate, Bill McCollum, received only 36 percent of the vote in Palm Beach county, compared with a statewide average of 48.7 percent.
In sum, there is no evidence that Palm Beach county is either a Buchanan stronghold, a Reform Party stronghold, or a conservative stronghold. Indeed, in every case the opposite seems to be true.
The new data from the 1996 primary and general election help us to get a more precise forecast of how Palm Beach county should have been expected to vote in 2000 than could be obtained from the information on the 2000 data alone. Combining information from all three sources (1996 primary, 1996 general election, 2000 general election) leads to a prediction that Buchanan should have received about 613 votes in the 2000 general election, compared to the 3407 votes he actually received - so the best guess is that about 2800 Buchanan ballots were errors.6 A statistical test from the model indicates that there is less than one chance in 600,000 that the Buchanan votes in Palm Beach county were all valid.7
Perhaps more convincing than the statistical models are simple graphs of the numbers of predicted and actual votes for Buchanan in Florida counties. Figure shows the predicted number of Buchanan votes on the horizontal axis versus the actual number on the vertical axis, with the solid lines indicating the confidence region for the statistical model;8 the data point for each county is represented by the first two letters of that county's name. Palm Beach county is the single point very far above the predicted line.
This figure is similar to figures constructed by several other people examining the Florida data, starting with Greg Adams of Carnegie-Mellon university (whose initial figure is what inspired me to analyze the data myself). However, statistical analysis of the data indicates that this figure may give a misleading impression that the Buchanan vote was even more overwhelmingly unlikely than it was. The problem is reflected in the fact that the error bands widen as the predicted Buchanan vote increases.
The statistical model that fits the data best is estimated in the form of the logarithm of the number of votes, and it is therefore more appropriate to plot the relationship between the predicted log number of votes and the actual log votes. Figure presents the data in this logarithmic form. Palm Beach county is again the single point that lies very far from the predicted line, though the gap is not nearly so large as when the predicted level of votes is plotted against the actual level of votes.
The results in the previous section were based on statistical models. Even if those models predicted a low Buchanan vote Palm Beach in 2000, we might not have much confidence in that prediction if the statistical models did not fit the data very well. For example, if the predicted Buchanan vote were 613 but the model implied that there was a 20 percent chance that he had received at least 3407 legitimate votes, we would not be able to say with much confidence that some of the Buchanan ballots were errors.
In fact, however, the statistical model fits the data with a very high degree of precision. We can therefore use information from the model to assess how likely it is that a given number of Buchanan ballots were erroneous. The critical question is whether there are more than enough erroneous Buchanan ballots so that if those had been counted as Gore votes, Gore would be leading the Florida popular vote count.9
Since the totals of the popular vote count seem to change every few hours, the best way to examine this is to construct a table indicating the probability that at least a given number of Buchanan votes are errors.
|The chances are||That there are less than this|
|less than||many erroneous Buchanan ballots|
|1 in 100||1966|
|1 in 1000||1543|
|1 in 10,000||1065|
|1 in 100,000||512|
|1 in 200,000||328|
The first row of the table reflects the usual statistical test applied to questions of this kind: It indicates that we can say with 99 percent confidence that at least 1966 Buchanan ballots were erroneous.
The table can also be used to get an idea of how unlikely it is for at least any given amount of ballots to have been erroneous. Thus, if we assume that the erroneous Buchanan ballots were all intended for Gore, we can then examine the amount by which Bush is ahead in the vote count and see how likely it is that the vote count would still show Bush ahead if the erroneous Buchanan ballots had been validly cast for Gore. For example, at the moment (1 PM on Monday, November 13) www.nytimes.com is reporting Bush ahead by 388. Using this table, we can see that the statistical model says that the chances are one in one hundred thousand there were fewer than 512 erroneous Buchanan votes. Since 388 < 512, the analysis indicates that we can say with greater than 99.999 percent confidence (because 1-1/100,000=0.99999) that if the erroneous Buchanan votes had been votes for Gore, Gore would be leading in the vote count at the moment.
A final note is in order. An analysis by Rob Shimer of Princeton University shows that when county level data for the entire United States are examined, Palm Beach County does not look like an exceptionally large outlier. (Shimer controls for general left/right preference in the counties by examining the Buchanan share of the vote versus the Gore share.) But all this means is that there were other counties in the United States where Buchanan received at least as much support, relative to Gore, as in Palm Beach. This does not in any way explain why in my state-level dataset, controlling for Palm Beach's usual voting patterns, Palm Beach voted overwhelmingly more for Buchanan than would be expected.
My initial analysis of this data attempted to explain Buchanan's vote share out of total votes cast. However, after completing that analysis and sending it out, I discovered that the residuals from that equation were not normally distributed, and thus statistical inference of the kind I had done (how likely is it that more than a certain number of ballots are bad) was problematic. After some exploration, I have found that when the equation was estimated in terms of the logarithm of the level of votes, normality of the errors cannot be rejected. Thus, the dependent variable in the regression analysis presented here will be bi2000 = logBi2000 where Bi2000 is the total number of votes for Buchanan cast in county i in the 2000 Presidential race.10
Suppose there is a linear relationship between the log Buchanan vote in 2000 and a vector of variables Xi, and define a dummy variable PALM which takes the value 1 for Palm Beach county and zero otherwise. Thus the regression equation can be written:
The prediction of the model for the log Buchanan vote in Palm Beach in 2000, based on the evidence in the Xi variables from the other Florida counties, is
We can first ask the question ``How likely would it be for Palm Beach County to have received 3407 or more legitimate Buchanan votes?'' Under the assumption that equation (1) is a correct specification, the question becomes what the probability is that
In order to make this formula operational, we need to decide upon a set of explanatory variables X, run the regression, and check that the residuals look approximately normal.
I now have county-level data on the 2000 Presidental election, the 2000 Senate election, the 1996 Presidential election, and the 1996 Republican primary. The explanatory variables in the last draft of this note were the log of the number of votes Buchanan received in the 1996 Republican presidential primary, bi96P; the log of the number of votes received by Bob Dole and Ross Perot (d96 = logDOLE96 and p96 = logPEROT96; and PER96 in the 1996 general election, to control for general tendencies to support Reform party candidates (Perot) or Republican candidates (Dole); and the log of the number of votes received by George W. Bush in the 2000 general election (hi = logBUSH2000).12 Rob Shimer of Princeton university criticised that specification because it did not separately include information on the total number of votes cast in each county, to account for scale effects. Thus I have modified the baseline specification to include the logs of the number of votes in the 1996 Republican primary, the 1996 general election, and the 2000 general election.13
Regression results are14
This equation's mean squared error is 0.2977, and it has an adjusted R2 of 0.938. The coefficient on the dummy for Palm Beach county has a t-statistic of 5.348. Since there are 67 counties and 9 parameters have been estimated, if the errors associated with this equation are normally distributed,15 the p-value associated with a test that the coefficient on PALM is zero should be given by t(59,5.348) « 1/636000, implying that there is less than one chance in 600,000 that there were no erroneous Buchanan votes.16
What we are most interested in, however, is whether the number of erroneous Buchanan ballots is at least as large as the amount by which Bush is leading in the vote count. This question can be phrased as ``For a given Bush lead, what is the probability that there were at least that many erroneous Buchanan ballots?''
Under the null hypothesis that there were no problems with the Palm Beach vote, the probability that the number of legitimate Buchanan votes is less than any given value [B\tilde] is t(df,(log[B\tilde] - [^b]50)/[^(s)]e). To figure out the maximum number of legitimate Buchanan votes associated with a given probability p we have
For example, suppose we are interested in the number n such that we can say ``With 99.99 percent confidence, we can say that there are at least n erroneous Buchanan ballots,'' we would calculate the [B\tilde] associated with p = 0.9999. The formula yields [B\tilde] = 2341. Since the true amount of Buchanan votes was 3407, we can say with 99.99 percent confidence that at least 3407-2342=1065 Buchanan ballots were erroneous.
Statistics like this should be taken with a grain of salt, since they rely heavily on the assumption that the residuals of the equation are normally distributed. While there is no evidence against normality, it remains possible that the tails of the distribution are nonnormal. Still, it seems virtually certain that plausible alternative statistical assumptions about the distribution of the residuals would yield qualitatively similar results.17
1 This note was written by Christopher D. Carroll, Associate Professor of Economics, Johns Hopkins University, firstname.lastname@example.org; the data and statistical programs used to perform the analysis, along with an html version of this document are available at my web page http://www.econ.jhu.edu/People/CCarroll/carroll.html. Thanks to Joe Harrington for bringing this data to my attention, and to Greg Adams of Carnegie Mellon University (http://sds.hss.cmu.edu) for posting the initial figure that inspired me to do this analysis and the data on the 1996 Presidential election and Republican primaries. See his website for another analysis of the data that reaches the same conclusion I do.
2 This note does not consider evidence for other possible reasons that Gore's votes might have been undercounted, such as the possibility that many of the 19,000 invalid ballots that were discarded in Palm Beach county were ballots on which voters may have voted first for Buchanan and then, realizing their mistake, punched Gore's hole. See http://elections.fas.harvard.edu/ for a broader statistical analysis of this and other issues.
3 I would like to thank Geert Ridder, Bruce Hansen, Jonathan Parker, Robert Jackson, Derrick Hatcher, Mark Braswell, Bobby Bodenheimer, Nik Buescher, Daniel Wang, Ranil Salgado, Michael Hoke, Jason Seligman, Rob Thau, and particularly Rob Shimer for valuable comments and help.
4 The term `statewide average' in this note refers to the average of the percentage for each county, which may differ slightly from the percentages for the state as a whole because counties differ in population.
5 One specific claim has been that Buchanan received `even more votes' in the 1996 primary than he received in the 2000 general election. This is true: Buchanan received 8788 votes in the 1996 primary. But this does not mean it is plausible that he received 3407 votes in Palm Beach county in 2000. For example, in the average Florida county other than Palm Beach Buchanan received 2329 votes in the 1996 Republican primary, but only 210 votes, less than a tenth as many, in the 2000 race. There is no reason that Palm Beach's expected votes in 2000 should not be scaled down by a similar magnitude as in other counties relative to the 1996 primary vote.
6 For details of the statistical model, see the Appendix.
7 Earlier version of this note found an even smaller likelihood that all the Buchanan votes were valid. See the appendix for a discussion of why the results have changed.
8 These are two-standard-error bands on either side of the predicted value.
9 Bush campaign officials have argued that perhaps not all of the erroneous Buchanan votes were intended for Gore. But it was easy to vote for Bush: his name was first on the ballot and the hole the voter was supposed to punch to vote for Bush was the first hole on the ballot. Gore's name was second on the ballot, but the voter was supposed to punch the third hole, though the second hole also overlapped with Gore's name. From the layout it is hard to imagine how a person intending to vote for Bush could have accidentally voted for Buchanan.
10 One of the right hand side variables will be the log of the total number of people who voted, which controls for scale effects. I thank Rob Shimer for pointing out the importance of including such a term.
11 I am grateful to my colleague Geert Ridder for verifying formally that this is the right standard error to use.
12 Information from the 2000 Senate race does not help to predict Buchanan votes, and so will not be included in the analysis
13 Performing the regression in logs, as I do, or in log shares, as Shimer prefers, generates precisely the same answer to the questions posed here, so long as both the log levels and log shares equations also allow the log vote totals as explanatory variables on the right hand side. I accept Shimer's argument that including log vote totals is appropriate and do so for all results reported here.
14 Coefficients on the voting populations, as described above, are not reported for lack of space. The programs and data that generated these results are available on my website, url above.
15 A Shapiro Wilk normality does not come close to rejecting normality at conventional significance levels.
16 While still colossally unlikely, this is not as great a probability as in earlier drafts of this note. Two factors account for the change. First, the initial draft of the note estimated the equation in terms of vote shares, but did not check for the normality of the errors before performing the confidence interval calculations. Upon examining the residuals, I found that normality could be strongly rejected at conventional significance levels. After some experimentation, I discovered that a log-log specification generated normal errors, and then I applied the same confidence tests as before, and obtained similar results. Unfortunately, it turns out that the normality test command in the statistical package I am using, STATA, scrambles the order of the data, and my confidence interval tests relied upon an assumption that Palm Beach county was the fiftieth observation in the dataset. Instead, after the sorting, Suwanee, a small county, was in the 50th spot, so the data for Suwanee were used instead of those for Palm Beach in the calculation of error bands. I have now corrected this error, which accounts for most of the reduction in statistical significance between this draft of the note and the previous draft.
17 Bootstrap methods would yield even more extreme conclusions, because the largest error in the regression equation for any county other than Palm Beach is 0.512. We can conclude that a bootstrap procedure would imply a probability of zero that the number of Buchanan votes in Palm Beach county was greater than exp([^b]50+0.512) = 1024, and therefore a probability of zero that there were fewer than 3407-1024=2383 erroneous Buchanan ballots.