Shared Names and Voter Fraud

Posted: April 6, 2014 in analysis

RedState.com contributor streiff wrote an analysis of a Fox News story about database matching of North Carolina’s voter roles.  The original story, while somewhat alarmist, at least acknowledges that the analysis is preliminary at best. What North Carolina did was compare a complete list of everyone who voted in NC in 2012 to a list of voters in 28 other states.   35,750 voters matched first name, last name, and birthdate.  765 voters also matched the last 4 digits of their social security number. The Fox News article points out that these aren’t guaranteed cases of voter fraud, but streiff ignores this, with his headline “Rampant Vote Fraud Uncovered In North Carolina”, and comments like this:

This is not a minor problem, this is an industry. Under a most favorable scenario one has to expect the overwhelming majority of the voters matching name and date of birth are the same person.

I don’t think we can accept that claim without some analysis.  The match on matching in large groups is a funny, and most people don’t judge it very well.

Birthdays In A Room

The classic example of this is to ask “How many people need to be in a room before there is a 50% chance that two of them share a birthday?”.  Most people think that, with 365 days to choose from, it would need to be in the hundreds. The actual answer is 23.  The first person to enter the room can have any birthday.  The second person has one day they cannot have.  The third has two, and so on.  While the chance for each person colliding with someone else as they enter the room is low, the cumulative chance that every person will dodge every other person shrinks faster than most people think.

Names And Birthdays Of Voters

The math for voters isn’t the same, but it’s not too hard to figure out.  There were 6.5 million NC voters in the data, and 101 million non NC voters.  Voter birthdays are not evenly distributed, which means they will tend to match more often than by pure chance, but to keep the math simple, I assumed all voters were equally proportioned between 18 and 62.  The lessened range is more than compensated for by the flattening of the voter density. Therefore, each voter in NC with a given name has a 1 in 16,000 chance of sharing a birthday with a voter in another state who also shares their name.

Now, consider John Smith.  Howmanyofme.com says that there are 45,963 people named John Smith in the United States.  Assuming they are evenly proportioned, 939 would be among the 6.5 million NC voters, and 14,670 would have voted in another state.  Each of the NC John Smiths has a 60% chance of sharing a birthdate with an out-of-state voter, which means that 564 of them would turn up on this list.

And that’s just one name.  There’s also James Smith, Michael Smith, Robert Smith… and John Jones, James Jones… and so on and so on and so on. In fact, just the top 10 first names paired with the top 10 last names on howmanyofme.com predicts nearly 25,000 matches.  So, the 35,570 is easily reachable with no foul play whatsoever.  The key in the math is that not only is a common name more likely to match, because there are more of them in other states, it will account for more matches, because there are more of them in NC.

An alternate analysis to to take a sample of real names and average them.  Here, we only count each name in NC once, but we still use the frequency to predict the chances of a match.  Finding a good sample is hard; I used the officers of several NC clubs with webpages, and the rosters of several NC high school sports teams.

The results were interesting.  Many names are unique or nearly unique in the country, so have very little chance of matching.  But a few spike very high, and account for the vast majority of the matches.  My results were trending considerably lower than the reported results until I came across a Michael Smith, which pulled the average up to nearly double reality.

It seems that this method is very sensitive to the sample, so I will consider the other method more reliable.  If the top 100 names can account for a significant fraction of the reported value, we can assume that the rest of names will account for the rest.

Names, Birthdate, and Last 4 SS#

But what about the matches that also included social security numbers?  The math for this is really the same as above, just with a 1 in 159,984,000 (16,000*9999) chance that matching names completely match. Now, however, the odds do grow long.   There is only about 0.08 of a John Smith match, and the top 100 names only account for about 6 total matches.  It is still possible, and indeed likely, that some of the 765 voters who matched in this way are pure coincidence.  But not all of them. Voter fraud isn’t the only explanation for this.  But it is one worth looking into.

Conclusion

Streiff’s claim that this report supports 1 million cases of voter fraud is preposterous.  Matching names and birthdates is a completely insufficient tool to make any claim about the uniqueness of the people in question. Including social security numbers goes a long way towards resolving this concern, and those results warrant further investigation.  However, they are well within the potential margin of human error.

Advertisements
Comments
  1. Jesse Morris says:

    I don’t understand. Isn’t it plainly the case that two “people” with the same name & DOB, but different SSNs are in fact separate people?

    Or are the ones with no SSN match often that way because no SSN is available for them?

    (Incidentally, there’s more than 10K people born in the US every day, which means that we should expect a DOB/last-4-SSN collision every day)

    • overanalytic says:

      The wording in the article is “However, in those cases middle names and Social Security numbers were not matched.” I read that as “the process of matching was not done” but it could also be read “the process was done and failed.” The second explanation seems less likely, since I can’t imagine why the state would say that 35,000 voters were provably not fraudulent.

      If the second explanation is the case, then my math is largely irrelevant and the original article is even more far-fetched than I made it out to be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s