Archive for the ‘analysis’ Category

Shared Names and Voter Fraud

Posted: April 6, 2014 in analysis contributor streiff wrote an analysis of a Fox News story about database matching of North Carolina’s voter roles.  The original story, while somewhat alarmist, at least acknowledges that the analysis is preliminary at best. What North Carolina did was compare a complete list of everyone who voted in NC in 2012 to a list of voters in 28 other states.   35,750 voters matched first name, last name, and birthdate.  765 voters also matched the last 4 digits of their social security number. The Fox News article points out that these aren’t guaranteed cases of voter fraud, but streiff ignores this, with his headline “Rampant Vote Fraud Uncovered In North Carolina”, and comments like this:

This is not a minor problem, this is an industry. Under a most favorable scenario one has to expect the overwhelming majority of the voters matching name and date of birth are the same person.

I don’t think we can accept that claim without some analysis.  The match on matching in large groups is a funny, and most people don’t judge it very well.

Birthdays In A Room

The classic example of this is to ask “How many people need to be in a room before there is a 50% chance that two of them share a birthday?”.  Most people think that, with 365 days to choose from, it would need to be in the hundreds. The actual answer is 23.  The first person to enter the room can have any birthday.  The second person has one day they cannot have.  The third has two, and so on.  While the chance for each person colliding with someone else as they enter the room is low, the cumulative chance that every person will dodge every other person shrinks faster than most people think. (more…)


The Null Hypothesis and Bias

Posted: May 24, 2013 in analysis

It’s been a long time, and this is actually a boring subject, but it’s something I wanted to talk about.

Last week Nate Silver of FiveThirtyEight wrote a piece about flawed statistical thinking in an op-ed by Peggy Noonan.  He used some simple calculations to show his point.

Dan McLaughlin of RedState had an issue with Mr. Silver’s piece.

Silver concedes of his statistical analysis that “this calculation assumes that individuals’ risk of being audited is independent of their political views,” which of course is the very thing in dispute; it’s like the old joke about an economist stranded on a desert island with a stack of canned goods whose solution begins, “assume a can opener.” All things being equal, all things are equal.

Mr. McLaughlin fundamentally misidentified what Silver was doing when he made that assumption.  It doesn’t weaken his argument; it is necessary to make it, statistically.

The Null Hypothesis

What Mr. Silver was doing in his piece was using an informal version of the null hypothesis, which is the foundation of much of modern statistics.  The fundamental mathematical principle behind statistical significance relies, not on proving a hypothesis, but on disproving the null hypothesis.

Thus, if a statistician wants to show that smoking causes cancer, he does the math assuming that smoking has no effect on cancer.  If the math leads to an unlikely result, he has disproven the null hypothesis.   If the math doesn’t, then he has failed to disprove the null hypothesis.  A statistician never proves a positive hypothesis, they simply disprove null hypotheses.

This is a bit tough to grasp, so I’ll try to explain it with a very simple example.  If I have a coin that I think might be weighted, the way I test that is to flip if a bunch of times and write down the results.  Then, I assume it was 50-50 heads tails, and ask, “If the coin weren’t biased, how unlikely would it be that I got the results I just did?”  In 100 flips, if my sample came out 47-53, sure, the most likely answer is that it is slightly weighted.  But the null hypothesis is still very likely, so I would not reject it.  If my sample were 82-18, however, that would be a staggeringly unlikely event with a fair coin, so it is probably safe to reject the null.  If I only flipped the coin twice, however, I couldn’t disprove the null even both flips were heads, since that has a good chance of happening regardless.  Mathematically, this is what is represented by the p-value; the probability of a result like the one in question, given the null hypothesis.

That an individual’s risk of audit is independent of their political views is a null hypothesis.  Mr. Silver proposes it, then shows that Peggy Noonan’s evidence does not disprove it.  Thus, statistically, Peggy Noonan has very weak evidence.  He does this by showing that, if the null is true, it would not be unusual to find four or five (indeed, four or five thousand) Republican donors that were audited.   So, the fact that Peggy Noonan did find four or  five Republican donors that were audited is not statistical evidence that the null is false.

The null hypothesis itself, however, is not a political statement.  It is simply the way one has to formulate the problem in order to use the mathematical tools available.

As a side note, I was banned from RedState some time ago for formulating a statistical query in this way, because the null hypothesis looked like a political position, so the fact that it has come up again is of some interest to me.

Unionization and Economic Growth

Posted: December 16, 2012 in analysis

Today we look at the following article on RedState, which purports to show that

…Without cherry-picking data as union bosses must in order to defend forced unionism, total seasonally adjusted non-farm employment growth shows a huge advantage for residents of right to work states.

The actual data presented, however, are the employment growth over 20 years for all right to work states, but only a few union-friendly states.

This, particularly considering that the introduction explicitly calls out cherry-picking, triggered my sensors.  So, let’s see what happens if we look at data for all states.

First, I have to find the data used to create this chart.  It took some rummaging on the BLS website, but I finally found numbers that almost, but not quite, recreate the numbers on the chart in the original article.  My version of the chart is below.


As you can see, it looks essentially like the version in the article, though because I am using slightly different data, the percentages vary by a point or two.

Now, I’d like to present a different chart, this time comparing Ohio to other union-friendly states.


Here is where the cherry picking comes in.  Ohio job growth isn’t terrible because it is union friendly, it’s terrible because it’s terrible.  Nearly everyone does better than Ohio regardless of their labor policy.

Now, there is potentially something to be said that right-to-work states have a greater gain over the last 20 years than union friendly states do.  But that wasn’t the argument; Jason hart asserted a “huge advantage” for right-to-work states, and presented evidence that was built around comparing to the second-worst performing state of any kind.

Presenting the data like that, particularly in the same sentence as calling out others for cherry-picking data, is disingenuous at best.

Speaking on CNN (and as reported at talkingpointsmemo), Governor Bob McDonnell, in an effort to link the economic recovery to Republican governors and not President Obama, said:

 “There’s something going on with Republican-governed states. Seven out of the 10 states nationwide, Candy, that have the lowest unemployment rates: Republican governor states.”

By now you should have figured out the drill; does this show real evidence that Republican-governed states have lower unemployment than Democratic-governed ones?

The short answer is no. The long answer is nnnnnnooooooooooooooooooooooooooooooooooooooooooo.  (sorry, bad joke). Simply put, there are more Republican governors than Democratic ones, so more Republican states appear in every part of the unemployment list.  Only 3 out of 10 of the lowest unemployment states are governed by Democrats, but only 3 out of 10 of the highest unemployment states are (the one independent governed state means that only 6 of them are Republican).

Of course, we can do better than just counting from the top 10 and bottom 10.  A simple statistical model can give a much better sense of whether governor party affiliation affects unemployment.  The answer is no.  While Republican governed states have slightly lower average unemployment, the difference is tiny (0.3%) and is very likely caused by random change (p=0.54).  Trying to refine the model by adding length of incumbency or length of party incumbency does not produce any results other than noise.  Sometimes one party is a little ahead, sometimes the other, but the results are never significant.

The conclusion is pretty clear, then.  Mr. McDonnell’s statement is factually true only in the most technical sense, and any implication he tries to draw from it is faulty.

Pay Equality at the White House

Posted: April 20, 2012 in analysis

An article on The Free Beacon here makes the simple claim that the White House pays women less than men, according to public records.  They then go on to imply (as others who link to them do more explicitly, like here) that this is demonstrative of an anti-woman attitude in the administration.

So, lets take a look at the numbers, shall we? (more…)

A few months ago, Supreme Court Justice Ruth Bader Ginsburg was discussing the drafting of a new Egyptian Constitution in Egypt, and she said that she didn’t believe that the US Constitution was the best model.  This terribly offended many commentators, as can be seen here, here, and here.

But could there be a good reason why Justice Ginsburg doesn’t think the US Constitution is a good model?  I think there is; Presidential systems do not lend themselves well to long-term stable democracies, which is the goal of a well-written Constitution.  Of course, the United States is the exception, but how well do other countries with a Presidential system fare? (more…)

Political Sex Scandals Revisited

Posted: November 8, 2011 in analysis

Some of you may recall that last summer, I tried to build a model to predict the results of political sex scandals, and documented my efforts here and here.  The model was unusual, and it turned out to predict the then-current sex scandal (David Wu) very poorly.

Well, another sex scandal has made the news, so it’s time to put my model to the test again.  Hermain Cain’s scandal isn’t very interesting; quite frankly.  The variables that matter to the model are pretty straightforward; Mr. Cain’s scandal is nothing special.

  • Intensity: 5 – multiple instances of sexual advances, but no actual sex.
  • Unfaithfulness: 7 – Cain has been married for 40+ years, but hasn’t quite been accused of actually cheating on his wife.
  • Kinkiness: 3 – Nothing more than a little dirty talk.
  • Hypocrisy: 4 – Courting the religious right but having adulterous intentions.
  • Coercion: 6 – The actions were non-consensual.

The other ratings (such as Contrition, which is 1 (Cain denies the events), and Plausibility, which is 6 (there isn’t very strong evidence that they happened), aren’t a part of the model.

So, as a low intensity Republican with a coercive but not kinky scandal, the model does not predict a happy outcome for Mr. Cain.  Specifically, the result is a value of 0.16, which means he will most likely drop out of the race or lose the nomination.  But, this is the same model that predicted that David Wu wasn’t going anywhere on the precise day he announced his resignation, so take that with a grain of salt.

It should also be noted that only one of my model data cases (Jack Ryan) was a non-incumbent candidate for election, so the dynamics may be very different.  But I have the model, so it’s worth testing it again.  And the best way to test is to make the prediction in advance of the event, so there you are.