Explanation and the quest for ‘significant’ relationships. Part II

In Part I I argue that the search and discovery of statistically significant relationships does not amount to explanation and is often misplaced in the social sciences because the variables which are purported to have effects on the outcome cannot be manipulated.

Just to make sure that my message is not misinterpreted – I am not arguing for a fixation on maximizing R-squared and other measures of model fit in statistical work, instead of the current focus on the size and significance of individual coefficients. R-squared has been rightly criticized as a standard of how good a model is** (see for example here). But I am not aware of any other measure or standard that can convincingly compare the explanatory potential of different models in different contexts. Predictive success might be one way to go, but prediction is altogether something else than explanation.

I don’t expect much to change in the future with regard to the problem I outlined. In practice, all one could hope for is some clarity on the part of the researchers whether their objective is to explain (account for) or find significant effects. The standards for evaluating progress towards the former objective (model fit, predictive success, ‘coverage’ in the QCA sense) should be different than the standards for the latter (statistical & practical significance and the practical possibility to manipulate the exogenous variables).

Take the so-called garbage-can regressions, for example. These are models with tens of variables all of which are interpreted causally if they reach the magic 5% significance level. The futility of this approach is matched only by its popularity in political science and public administration research. If the research objective is to explore a causal relationship, one better focus on that variable and  include covariates only if it is suspected that they are correlated with the outcome and with the main independent variable of interest. Including everything else that happens to be within easy reach not only leads to inefficiency in the estimation. One should refrain from  interpreting causally the significance of these covariates altogether. On the other hand, if the objective is to comprehensively explain (account for) a certain phenomenon, than including as many variables as possible might be warranted but than the significance of individual variables is of little interest.

The goal of research is important when choosing the research design and the analytic approach. Different standards apply to explanation, the discovery of causal effects, and prediction. 

**Just one small example from my current work – a model with one dependent and one exogenous time-series variables in levels with a lagged dependent variable included on the right-hand side of the equation produces an R-squared of 0.93. The same model in first differences has an R-squared of 0.03 while the regression coefficient of the exogenous variable remains significant in both models. So we can ‘explain’ 90% of the variation in the first case by reference to the past values of the outcome. Does this amount to an explanation in any meaningful sense? I guess that depends on the context. Does it provide any leverage to the researcher to manipulate the outcome? Not at all.

Explanation and the quest for ‘significant’ relationships. Part I

The ultimate goal of social science is causal explanation*. The actual goal of most academic research is to discover significant relationships between variables. The two goals are supposed to be strongly related – by discovering (the) significant effects of exogenous (independent) variables, one accounts for the outcome of interest. In fact, the working assumption of the empiricist paradigm of social science research is that the two goals are essentially the same – explanation is the sum of the significant effects that we have discovered. Just look at what all the academic articles with ‘explanation’, ‘determinants’, and ’causes’ in their titles do – they report significant effects, or associations, between variables.

The problem is that explanation and collecting significant associations are not the same. Of course they are not. The point is obvious to all uninitiated into the quantitative empiricist tradition of doing research, but seems to be lost to many of its practitioners. We could have discovered a significant determinant of X, and still be miles (or even light-years) away from a convincing explanation of why and when X occurs. This is not because of the difficulties of causal identification – we could have satisfied all conditions for causal inference from observational data, but the problem still stays. And it would not go away after we pay attention (as we should) to the fact that statistical significance is not the same as practical significance. Even the discovery of convincingly-identified causal effects, large enough to be of practical rather than only statistical significance, does not amount to explanation. A successful explanation needs to account for the variation in X, and causal associations need not to – they might be significant but not even make a visible dent in the unexplained variation in X. The difference I am talking about is partly akin to the difference between looking at the significance of individual regression coefficients and looking at the model fit as a whole (more on that will follow in Part II). The current standards of social science research tend to emphasize the former rather than the later which allows for significant relationships to be sold as explanations.

The objection can be made that the discovery of causal effects is all we should aim for, and all we could hope for. Even if a causal relationship doesn’t account for large amounts of variation in the outcome of interest, it still makes a difference.  After all, this is the approach taken in epidemiology, agricultural sciences and other fields (like beer production) where the statistical research paradigm has its origins. A pill might not treat all headaches but if it has a positive and statistically-significant effect, it will still help millions. But here is the trick – the quest for statistically significant relationships in epidemiology, agriculture, etc. is valuable because all these effects can be considered as interventions – the researchers have control over the formula of the pill, or the amount of pesticide, or the type of hops. In contrast, social science researchers too often seek and discover significant relationships between an outcome and variables that couldn’t even remotely be considered as interventions. So we end up with a pile of significant relationships which do not account for enough variation to count as a proper explanation and they have no value as interventions as their manipulation is beyond our reach. To sum up, observational social science has borrowed an approach to causality which makes sense for experimental research, and applied its standards (namely, statistical significance) to a context where the discovery of significant relationships is less valuable because the ’treatments’ cannot be manipulated. Meanwhile, what should really count – explaining when, how and why a phenomenon happens, is relegated to the background in the false belief that somehow the quest for significant relationships is a substitute. It is like trying to discover the fundamental function of the lungs with epidemiological methods, and claiming success when you prove that cold air reduces significantly lung capacity. While the inference might still be valuable, it is no substitue for the original goal.

In Part II, I will discuss what needs to be changed, and what can be changed in the current practice of empirical social science research to address the problem outlined above.

*In my understanding, all explanation is causal. Hence, ‘causal explanation’ is tautology. Hence, I am gonna drop the ‘causal’ part for the rest of the text.

Google tries to find the funniest videos

Following my recent post on the project which tries to explain why some video clips go viral, here is a report on Google’s efforts to find the funniest videos:

You’d think the reasons for something being funny were beyond the reach of science – but Google’s brain-box researchers have managed to come up with a formula for working out which YouTube video clips are the funniest.

The Google researcher behind the project is quoted saying:

‘If a user uses an “loooooool” vs an “loool”, does it mean they were more amused? We designed features to quantify the degree of emphasis on words associated with amusement in viewer comments.’

Other factors taken into account are tags, descriptions, and ‘whether audible laughter can be heard in the background‘. Ultimately, the algorithm gives a ranking of the funniest videos  (with No No No No Cat on top, since you asked).

Now I usually have high respect for all things Google, but this ‘research’ at first appeared to be a total piece of junk. Of course, it turned out that it is just the way it is reported by the Daily Mail (cited above), New Scientist and countless other more or less reputable outlets.

Google’s new algorithm does not provide a normative ranking of the funniest videos ever based on some objective criteria; it is a predictive score about the video’s comedic potential. Google trained the algorithm on a bunch of videos (it’s unclear from the original source what the external ‘fun’ measure used for the training part was) in order to inductively extract features  associated with the video being funny. Based on these features, the program can then score any possible video. But these scores are not normative measures, they are predictions. So No No No No Cat is not the funniest video ever [well, it might be, it's pretty hilarious actually], it is Google’s safest bet that the video would be considered funny.

The story is worth mentioning not only because it exposes yet another case of gross misinterpretation of a scientific project in the news, but because it nicely illustrates the differences between measurement, prediction, and explanation. The newspapers have taken Google’s project to be an exercise in measurement. As explained above, the goal is actually predictive in nature. But even if the algorithm has 100% success rate in identifying potentially funny videos, that would still not count as an explanation of what makes a video funny. Just think about it – would a boring video become funny if we just put funny tags, background laughter, and plenty of  loools in the comments? Not really. In that respect Brent Coker’s approach, which I mentioned in a previous post, has real explanatory potential (although I doubt whether it has any explanatory power).

So, no need to panic, the formula for something being funny is as distant as ever.

P.S. In an ironic turn of events, now that  No No No No Cat has gone viral, Google would never know whether the algorithm was very good, or just everyone wanted to see the video Google declared the funnies ever. Ah, the joys of social science research!

Weighted variance and weighted coefficient of variation

Often we want to compare the variability of a variable in different contexts – say, the variability of unemployment in different countries over time, or the variability of height in two populations, etc. The most often used measures of variability are the variance and the standard deviation (which is just the square root of the variance). However, for some types of data, these measures are not entirely appropriate. For example, when data is generated by a Poisson process (e.g. when you have counts of rare events) the mean equals the variance by definition. Clearly, comparing the variability of two Poisson distributions using the variance or the standard deviation would not work if the means of these populations differ. A common and easy fix is to use the coefficient of variation instead, which is simply the standard deviation divided by the mean. So far, so good.

Things get tricky however when we want to calculate the weighted coefficient of variation. The weighted mean is just the mean but some data points contribute more than others. For example the mean of 0.4 and 0.8 is 0.6. If we assign the weights 0.9 to the first observation [0.4] and 0.1 to the second [0.8], the weighted mean is (0.9*0.4+0.1*0.8)/1, which equals to 0.44. You would guess that we can compute the weighted variance by analogy,  and you would be wrong.

For example, the sample variance of {0.4,0.8} is given by [Wikipedia]:

or in our example ((0.4-0.6)^2+(0.8-0.6)^2) / (2-1) which equals to 0.02. But, the weighted sample variance cannot be computed by simply adding the weights to the above formula (0.9*(0.4-0.6)^2+0.1*(0.8-0.6)^2) / (2-1). The formula for the weighted variance is different [Wikipedia]:

where V1 is the sum of the weights and V2 is the sum of squared weights:.
The next steps are straightforward: the weighted standard deviation is the square root of the above, and the weighted coefficient of variation is the weighted standard deviation divided by the weighted mean.

Although there is nothing new here, I thought it’s a good idea to put it together because it appears to be causing some confusion.  For example, in the latest issue of European Union Politics you can find the article ‘Measuring common standards  and equal responsibility-sharing in EU asylum outcome data’  by a team of scientists from LSE. On page 74, you can read that:

The weighted variance [of the set p={0.38, 0.42} with weights W={0.50,0.50}] equals 0.5(0.38-.0.40)^2+0.5(0.42-0.40)^2 =0.0004.

As explained above, this is not generally correct unless the biased (population) rather than the unbiased (sample)  weighted variance is meant. When calculated properly, the weighted variance turns out to be 0.0008. Here you can find the function Gavin Simpson has provided  for calculating the weighted variance in R and try for yourself.

P.S. To be clear, the weighted variance issue is not central to the argument of the article cited above but is significant as the authors discuss at length the methodology for estimating variability in data and introduce the so-called Coffey-Feingold-Broomberg measure of variability which the authors  deem more appropriate for proportions.

P.P.S On the internet, there is yet more confusion: for example, this document (which pops high in the Google results) has yet a different formula, shown in a slightly different form here  as  well.

Disclaimer. I have a forthcoming paper on the same topic (asylum policy) as the EUP article mentioned above.

The Good, the Bad, and the Stranger

Once upon a time, in a land far away, there lived two brothers. The first brother was like an ox: strong, dutiful and hard-working. The second brother was like a rotten apple – useless, menacing, and foul. The first brother set up a small enterprise which quickly took root and sprawled. Soon, he needed to hire a helping hand. He could either employ his brother, who was wicked and lazy, but still a relation, or a Stranger who was diligent and qualified, but came from some distant God-forsaken place. At this point the story forks and you, the reader, have to choose which path to take:

 - You hire the stranger. The enterprise grows and prospers. Your brother vanishes in misery. Every Christmas you send him a present to an address he has long abandoned. This is the way of the capitalist.

- You hire the brother. He might be trouble, but he is your own blood. And, on his advice you close your community to strangers. Soon, your brother stops showing for work, and more often than not shows up drunk. You quarrel, and curse but you stay loyal, and the enterprise rapidly goes into wreck. But you go down together. This is the way of the nationalist.

- You hire the stranger. Every month you take a generous slice from your profit, and a big cut from the stranger’s salary and you give them to your brother. Your brother acquires a big TV, junk food addiction and a feeling of entitlement which leads him to wrangle every time your contributions are late. But the enterprise survives and your conscience is clear. This is the way of the socialist.

But whatever you chose, the good times come to an end, the fat years are over, and a long and painful crisis settles in the land. In the capitalist path of the story, the stranger, who has been saving during all the good years, buys the enterprise from you. You ask him to employ you, but he hires his brother instead and kicks you out of the door.

In the nationalist path of the story the good times were over long time ago anyways. You had long since reached the bottom and what only keeps you alive is the deep hatred of your neighbors, which is the one remaining thing that you share with your brother.

In the socialist path of the story, your brother suddenly feels the pain when your monthly contributions dry up. He accuses the stranger of stealing his job and having no right to be here and causing too much trouble altogether. He starts to pester you, to beg and to threaten. Finally, you succumb, kick out the stranger and hire your brother instead. But he has never done a day of work, so he quickly develops back problems and sues you for damage, which drives the enterprise to its end.

So what is the moral of this story? I don’t know, you tell me, I’m just the stranger.

No use for big data in electioneering, according to Hollywood

Over the last year two major Hollywood movies that touch upon the use of big data and sophisticated data analysis hit the big screen. Which, of course, is two more than the mean (or was that the median). Moneyball shows how crunching numbers helps win baseball games and Margin Call shows how crunching numbers helps ruin financial firms. It’s kind of fun to see Brad Pitt and Kevin Spacey stare at spreadsheets and nod approvingly while being explained some statistical subtleties. But watching someone stare at somebody else’s spreadsheets quickly becomes tiresome … which probably explains why Regressing with the Stars, Dotchart Master, and America’s Next Multilevel Model haven’t yet taken over reality TV.

So I was really disappointed to see that a third 2011 movie – The Ides of March – misses a golden opportunity to show the use of big data and sophisticated analysis for winning elections. The movie revolves around the primary presidential campaign of George Clooney (pardon, Governor Mike Morris) and the dirty politics behind the scenes. But for Hollywood in 2011, electioneering is still a game of horse-trading, media spinning and good-ol’ stabs in the back. All these things about election campaigns are probably true, but I was disappointed that there were no fancy graphs plotting approval ratings and prediction market quotes, no real-time election forecasts (or nowcasts) at which  George Clooney to stare and nod approvingly, no GIS-supported campaign targeting, not even focus groups, twits, facebook pages, not to speak of google circles. Now, I have never been involved  in an election campaign but I would have guessed that some of what political scientists are doing to analyze election outcomes and the effects of various elements of election campaigns has filtered through to campaign managers. But according to The Ides of March, electioneering is still stuck in the 1990-s. Someone get Hollywood a subscription to Political Analysis.

In fact, the only difference between The Ides of March and The War Room – the 1993 documentary about Bill Clinton’s 1992 presidential campaign – is that the actors in The Ides of March wear less hideous suits. And the intern is blond (just joking). Now when I think about it, the documentary The War Room actually packs more drama and suspense than the scripted The Ides of March. Which in fact is true about the documentary Inside Job vis-a-vis Margin Call as well.

P.S. My recent movie ratings can be found here.