In part I, I discussed how to create a data set to describe the nature of political sex scandals among national politicians in the US. Now, I will try to build a model from that data set that describes the outcome of the scandal by its basic properties.

Describing the model building process to a non-technical audience is difficult. Basically, it involves looking at the options in the data, proposing a hypothesis on how they may relate, and testing it. Depending on the results of the test, you either reject the hypothesis, try to refine it, or include it in the model.

When complete, the model is simply a mathematical formula that takes all of the inputs (those variables included in it) and manipulates them to produce an output. In this case, I am using logistic regression, so the output is a value between 0 and 1, where 0 represents resigned, and 1 represents won re-election.

Because values lower than 0 and higher than 1 are nonsensical, the formula doesn’t operate on the probability directly, but rather on another value called a logit, which ranges from positive infinity to negative infinity. Thus, unlike many linear models, this one doesn’t have easily interpretable coefficients; that is to say, you can’t say “For every year of seniority, a Senator is 10% less likely to resign,” since this could lead to Senators having less than a 0% chance of resignation.

##### The Model

Now, lets look at what goes into the model itself. Seniority, Contrition, Hypocrisy, Plausibility, and Misappropriations seem to have no influence on the outcome of a scandal. This leaves Intensity, Coercion, Kinkiness, Unfaithfulness, and Party. The way these affect scandals, however, is unusual.

The model starts by assigning a flat penalty to Republicans. This is simple and easy to understand.

The first strange result is that a higher Intensity positively correlates to staying in office. This is backwards from what you (and I) might guess, but is among the more statistically solid of the components of this model. Consider Eric Massa, Anthony Weiner, Jack Ryan, and Larry Craig. These four men make up the bottom four on the Intensity list; none of their actions involved actual sexual activity. Yet all four resigned, dropped out of elections, or did not run. At the other end of the spectrum, Mark Sanford has the most intense experience of the entire data set (having met his soul mate), and he served the remainder of his term as governor.

The next term fits in so nicely with dogmatic political rhetoric that I’m almost embarrassed to include it, but it fits in the model well. Unfaithfulness hurts your chances of remaining in office, but only for Democrats. Republicans are unaffected by unfaithfulness. The magnitude of this effect is larger than that of Intensity, per point on the scale.

The last term in the model is the hardest to interpret. Alone, kinkiness and coercion are each associated with remaining in office, coercion much more so than kinkiness. But together, they are associated with leaving office. However, the magnitude of their influence combined, compared to separately, is such that the net effect is negative in only one of the cases in the data set, that of Donald “Buz” Lukens. And in his case, I am not confident that I effectively assigned those ratings.

So, the model, in non quantitative terms, is as follows:

There is a good chance the subject will stay in office, but…

- If they are a Republican, the chance is less.
- The chances improve the more intense the activity of the scandal.
- The chances decrease if the subject was unfaithful and is a Democrat.
- The chances increase a little the kinkier the activity.
- The chances increase more the more coercive the activity.
- The chances decrease if the activity was both kinky and coercive.

##### Testing The Model

But how good of a model is it? Remember the output of this model is a number between 0 and 1, representing a probability of sorts. Since there are more than two outcomes, however, I assigned them values between 0 and 1 in order of candidate strength. So the model is predicting those outcomes that are closest to the number it produces.

For the 36 members of the data set, we can compare the model’s predicted value to the observed value. We are within 0.1 in 17 of those cases, and greater than 0.5 in 4. The mean absolute likelihood error is 0.18.

The “good cases” of the model are Brock Adams, Larry Craig, Vito Fossella, Barney Frank, Gary Hart, Wayne Hays, Jack Ryan, Chris Lee, Donald Lukens, Eric Massa, Bob Packwood, Paul Patton, Mark Sanford, Gus Savage, Gerry Studds, Anthony Weiner, and Bob Wise. The “bad cases” on the other hand, are Nikki Haley, Arnold Schwarzennegger, Mark Souder, and David Vitter. Note that this is Arnold Schwarzennegger’s early scandals regarding sexual harassment, not his later affair with a housekeeper (which broke when he was no longer in office).

But now, we come to the moment of truth. What does this model predict the outcome of the David Wu scandal will be? Describing David Wu in the terms we used for the other candidates and running the numbers, we get 0.9999997, a prediction that David Wu will quite definitely remain in office. And, at the very moment I write this, there is breaking news that he is resigning.

##### Conclusions

So, what does the model mean, and why didn’t it work? There are several possible answers. First, the data sample I used is not random. It was all the sex scandals I could find out about and find enough information on. This includes a dramatic bias towards more memorable scandals, particularly older ones. This may account for the reason that intensity, against all common sense, correlates with remaining in office. The only scandals that did not include intense activity that we remember are those with some other significant factor, so those are more likely to end negatively.

A similar point possibly applies to Plausibility, which may have prevented it from being included in the model when it should. Certainly, it is notable that the least plausible event in the data set (Nikki Haley’s affair) was very poorly predicted. In short, most implausible accusations, without evidence or a full or partial confession, are forgotten within a few years. There may be many more Nikki Haley like mini-scandals from the 80s and 90s that would have helped shape the model, but are buried in contemporary news reports.

Another possible explanation of the model is that it doesn’t represent any inherent facts about the world, but is a quirk of mathematics. Particularly when testing a lot of models in one set of data, it is possible that a model fits well by pure coincidence, but doesn’t help on cases outside the sample.

Lastly, the model might actually be fine, and David Wu’s case is just unusual in some way not captured in the variables. This is in fact somewhat true, since he has had a previous sex scandal (though one I did not discover in my research, so it is not in the sample data set) and a recent general mental health scandal, which are not represented in the characterization of this scandal.

So, overall, I don’t think there is any great insight this project offers on sex scandals. The outcome of a scandal is determined by many factors, and a simple model based on subjective rating of a few of these isn’t going to cut it. But, I think it is possible to learn as much from the analyses that don’t bear out as one can from those that do.

You didn’t mention about the method by which you came up with your models; whether it was by hand or by computer, or the name of the particular method you used.

The particular data set you have looks to be more or less made for one of the standard supervised machine learning algorithms. If you use Python: http://pybrain.org/docs/tutorial/datasets.html offers a supervised training algorithm that may or may not offer an alternate solution that fits better with your data.

Also, I wanted to point out that with any kind of analysis, data wins. That is, getting more data will tend to allow you to get better results than tweaking your algorithm, though generally, a fully-automatic algorithm will tend to do better than by-hand algorithms (it can test many more possibilities than a human). In the world of machine learning and/or data analysis, 36 samples is much closer to zero than the typical thousands, millions, or billions of samples desired. And at this point, almost every prediction will be outside reality because the model is still fairly untrained.

Regardless; great articles. I’ve been having a good time reading them. 🙂

Thanks for your compliments; I’m having a good time writing these posts.

As to your specific questions, I was avoiding too much technical discussion of the model building, but hey, if you’re interested; there are two steps to building this model. The first is choosing the terms. This I did by hand, exploring different hypothesis and rejecting those that showed no signs of statistical significance. The second step (done for each iteration of the first step) is, given those terms, assigning the model coefficients in order to maximize the likelihood of the data, given the model, and calculating a confidence level that the model is a true representation of the data, and not simply random noise. I relied on a generalized linear model to do this, specifically a quasibinomial logistic regression. I used the statistical software package R to do these calculations.

The danger of using an automated system to search through the model space is that given enough models, it eventually find one that appears significant. However, it will be very hard to tell if it is significant because it represents a fundamental truth about the data, or if it is significant because eventually, everything will line up by pure chance. I’ve done a little Python programming, but not nearly enough to tell if that is what you are talking about with the supervised training algorithm.

Lastly, 35 cases is very small for some forms of data sets (particularly where the model fits are expected to be polynomial), but from a statistical perspective, its actually more than enough samples for a model like the one I was using. The general rule is that each term of the model uses up one degree of freedom in the residual, and once the residual degrees of freedom get below 15 or so, the error distribution becomes non-normal. This is the difference between the t-distribution and the normal distribution. The residual of my model still had 28 degrees of freedom. I actually expect that as I found more cases and entered them into the data set, the model would get muddier and muddier, and it would be harder to form significant terms. But that’s just a guess.