Interest groups and the making of legislation

How are the activities of interest groups related to the making of legislation? Does mobilization of interest groups lead to more legislation in the future? Alternatively, does the adoption of new policies motivate interest groups to get active? Together with Dave Lowery, Brendan Carroll and Joost Berkhout, we tackle these questions in the case of the European Union. What we find is that there is no discernible signal in the data indicating that the mobilization of interest groups and the volume of legislative production over time are significantly related. Of course, absence of evidence is the same as the evidence of absence, so a link might still exist, as suggested by theory, common wisdom and existing studies of the US (e.g. here). But using quite a comprehensive set of model specifications we can’t find any link in our time-series sample. The abstract of the paper is below and as always you can find at my website the data, the analysis scripts, and the pre-print full text. One a side-note – I am very pleased that we managed to publish what is essentially a negative finding. As everyone seems to agree, discovering which phenomena are not related might be as important as discovering which phenomena are. Still, there are few journals that would apply this principle in their editorial policy. So cudos for the journal of Interest Groups and Advocacy.

Abstract
Different perspectives on the role of organized interests in democratic politics imply different temporal sequences in the relationship between legislative activity and the influence activities of organized interests.  Unfortunately, lack of data has greatly limited any kind of detailed examination of this temporal relationship.  We address this problem by taking advantage of the chronologically very precise data on lobbying activity provided by the door pass system of the European Parliament and data on EU legislative activity collected from EURLEX.  After reviewing the several different theoretical perspectives on the timing of lobbying and legislative activity, we present a time-series analysis of the co-evolution of legislative output and interest groups for the period 2005-2011. Our findings show that, contrary to what pluralist and neo-corporatist theories propose, interest groups neither lead nor lag bursts in legislative activity in the EU.

Timing is Everything: Organized Interests and the Timing of Legislative Activity
Dimiter Toshkov, Dave Lowery, Brendan Carroll and Joost Berkhout
Interest Groups and Advocacy (2013), vol.2, issue 1, pp.48-70

Correlation does not imply causation. Then what does it imply?

‘Correlation does not imply causation’ is an adage students from all social sciences are made to recite from a very early age. What is less often systematically discussed is what could be actually going on so that two phenomena are correlated but not causally related. Let’s try to make a list:

1) The correlation might be due to chance. T-tests and p-values are generally used to guard against this possibility.

1a) The correlation might be due to coincidence. This is essentially a variant of the previous point but with focus on time series. It is especially easy to mistake pure noise (randomness) for patterns (relationships) when one looks at two variables over time. If you look at the numerous ‘correlation is not causation’ jokes and cartoons on the internet, you will note that most concern the spurious correlation between two variables over time (e.g. number of pirates and global warming): it is just easier to find such examples in time series than in cross-sectional data.

1b) Another reason to distrust correlations is the so-called ‘ecological inference‘ problem. The problem arises when data is available at several levels of observation (e.g. people nested in municipalities nested in states). Correlation of two variables aggregated at a higher level (e.g. states) cannot be used to imply correlation of these variables at the lower (e.g. people). Hence, the higher-level correlation is a statistical artifact, although not necessarily due to mistaking ‘noise’ for ‘signal’.

2) The correlation might be due to a third variable being causally related to the two correlated variables we observe. This is the well-known omitted variable problem. Note that statistical significance test have nothing to contribute to the solution of this potential problem. Statistical significance of the correlation (or, of the regression coefficient, etc.) is not sufficient to guarantee causality. Another point that gets overlooked is that it is actually pretty uncommon for a ‘third’ (omitted) variable to be so highly correlated with both variables of interest as to induce a high correlation between them which would disappear entirely once we account for the omitted variable. Are there any prominent examples from the history of social science where a purported causal relationship was later discovered to be completely spurious due to an omitted variable (not counting time series studies)?

3) Even if a correlation is statistically significant and not spurious in the sense of 2), there is still nothing in the correlation that establishes the direction of causality. Additional information is needed to ascertain in which way the causal relationship flows. Lagging variables and process-tracing case studies can be helpful.

All in all, that’s it: a correlation does not imply causation, but unless the correlation is due to noise, statistical artifact, or an confounder (omitted variable), correlation is pretty suggestive of causation. Of course, causation here means that a variable is a contributing factor to variation in the outcome, rather than that the variable can account for all the changes in the outcome. See my posts on the difference here and here.

Am I missing something?

Facebook does randomized experiments to study social interactions

Facebook has a Data Science Team. And here is what they do:

Eytan Bakshy […] wanted to learn whether our actions on Facebook are mainly influenced by those of our close friends, who are likely to have similar tastes. […] So he messed with how Facebook operated for a quarter of a billion users. Over a seven-week period, the 76 million links that those users shared with each other were logged. Then, on 219 million randomly chosen occasions, Facebook prevented someone from seeing a link shared by a friend. Hiding links this way created a control group so that Bakshy could assess how often people end up promoting the same links because they have similar information sources and interests  [link to source at Technology Review].

It must be great (and a great challenge) to have access to all the data Facebook and use it to answer questions that are relevant not only for the immediate business objectives of the company. In the words of the Data Science Team leader:

“The biggest challenges Facebook has to solve are the same challenges that social science has.” Those challenges include understanding why some ideas or fashions spread from a few individuals to become universal and others don’t, or to what extent a person’s future actions are a product of past communication with friends.

Cool! These statements might make for a good discussion about the ethics of doing social science research inside and outside academica as well.

Protestants, Missionaries and the Diffusion of Liberal Democracy

A new APSR article [ungated] argues for the crucial role of Protestant missionaries in the global spread of liberal democracy. The statistical analyses tease out the effect of missionaries from the influence of the characteristics of colonizers (Britain, the Netherlands, France, etc.) and pre-existing geographic, economic and cultural characteristics of the states. Interestingly, Protestant missionary influence not only remains a significant predictor of democracy outside the Western world once these factors are controlled for, but it renders them obsolete (which is a big deal because the same institutional, geographic, economic and cultural characteristics have been the usual explanations of democracy diffusion). On the other hand, the patterns in the data are consistent with the plausible mechanisms through which the effect of Protestant missionaries is exercised – the spread of newspapers, education, and civil society.

I am sure this article is not going to be the last word on democracy diffusion, but it certainly puts the influence of Protestantism center stage. The major issue, I suspect, is not going to be methodological (since the article already considers a plethora of potential methodological complications in the appendix), but conceptual – to what extent the effect of Protestant missionaries can be conceptually separated from the improvements in education and the growth of the public sphere. In other words, do (did) you need the religious component at all, or education, newspapers and civil society would have worked on their own to make liberal democracy more likely (even if fostered by other channels than Protestant missionaries) .

In terms of methodology, it might be interesting to analyze the same data using necessary and sufficient conditions: I would find it even more informative to see whether the presence of Protestant missionaries is necessary and/or sufficient for the emergence of stable liberal democracy, in addition to the evidence for a robust (linear?) association between the two, as reported in the current article.

Here is the abstract:

This article demonstrates historically and statistically that conversionary Protestants (CPs) heavily influenced the rise and spread of stable democracy around the world. It argues that CPs were a crucial catalyst initiating the development and spread of religious liberty, mass education, mass printing, newspapers, voluntary organizations, and colonial reforms, thereby creating the conditions that made stable democracy more likely. Statistically, the historic prevalence of Protestant missionaries explains about half the variation in democracy in Africa, Asia, Latin America andOceaniaand removes the impact of most variables that dominate current statistical research about democracy. The association between Protestant missions and democracy is consistent in different continents and subsamples, and it is robust to more than 50 controls and to instrumental variable analyses.

Models in Political Science

Inside Higher Ed has a good interview with David Primo and Kevin Clarke on their new book A Model Discipline: Political Science and the Logic of Representations.  The book and the interview criticize the hypothetico-deductive tradition in social science:

The actual research was prompted by a student who asked, “Why test deductive models?” The essence of a deductive model is that if the assumptions of the model are true, then the conclusions must be true. If the assumptions are false, then the conclusions may be true or false, and the logical connection to the model is broken. The point is that social scientists work with assumptions that are known to be false. Thus, whether a model’s conclusions are true or not has nothing to do with the model itself, and “testing” cannot tell us anything that we did not already know.

My thoughts exactly. Unfortunately, I don’t see the new book  changing the practice of political science research (Primo and Clarke are also pessimistic about the short term impact of the book).

Explanation and the quest for ‘significant’ relationships. Part II

In Part I I argue that the search and discovery of statistically significant relationships does not amount to explanation and is often misplaced in the social sciences because the variables which are purported to have effects on the outcome cannot be manipulated.

Just to make sure that my message is not misinterpreted – I am not arguing for a fixation on maximizing R-squared and other measures of model fit in statistical work, instead of the current focus on the size and significance of individual coefficients. R-squared has been rightly criticized as a standard of how good a model is** (see for example here). But I am not aware of any other measure or standard that can convincingly compare the explanatory potential of different models in different contexts. Predictive success might be one way to go, but prediction is altogether something else than explanation.

I don’t expect much to change in the future with regard to the problem I outlined. In practice, all one could hope for is some clarity on the part of the researchers whether their objective is to explain (account for) or find significant effects. The standards for evaluating progress towards the former objective (model fit, predictive success, ‘coverage’ in the QCA sense) should be different than the standards for the latter (statistical & practical significance and the practical possibility to manipulate the exogenous variables).

Take the so-called garbage-can regressions, for example. These are models with tens of variables all of which are interpreted causally if they reach the magic 5% significance level. The futility of this approach is matched only by its popularity in political science and public administration research. If the research objective is to explore a causal relationship, one better focus on that variable and  include covariates only if it is suspected that they are correlated with the outcome and with the main independent variable of interest. Including everything else that happens to be within easy reach not only leads to inefficiency in the estimation. One should refrain from  interpreting causally the significance of these covariates altogether. On the other hand, if the objective is to comprehensively explain (account for) a certain phenomenon, then including as many variables as possible might be warranted but then the significance of individual variables is of little interest.

The goal of research is important when choosing the research design and the analytic approach. Different standards apply to explanation, the discovery of causal effects, and prediction.

**Just one small example from my current work – a model with one dependent and one exogenous time-series variables in levels with a lagged dependent variable included on the right-hand side of the equation produces an R-squared of 0.93. The same model in first differences has an R-squared of 0.03 while the regression coefficient of the exogenous variable remains significant in both models. So we can ‘explain’ 90% of the variation in the first case by reference to the past values of the outcome. Does this amount to an explanation in any meaningful sense? I guess that depends on the context. Does it provide any leverage to the researcher to manipulate the outcome? Not at all.

Explanation and the quest for ‘significant’ relationships. Part I

The ultimate goal of social science is causal explanation*. The actual goal of most academic research is to discover significant relationships between variables. The two goals are supposed to be strongly related – by discovering (the) significant effects of exogenous (independent) variables, one accounts for the outcome of interest. In fact, the working assumption of the empiricist paradigm of social science research is that the two goals are essentially the same – explanation is the sum of the significant effects that we have discovered. Just look at what all the academic articles with ‘explanation’, ‘determinants’, and ’causes’ in their titles do – they report significant effects, or associations, between variables.

The problem is that explanation and collecting significant associations are not the same. Of course they are not. The point is obvious to all uninitiated into the quantitative empiricist tradition of doing research, but seems to be lost to many of its practitioners. We could have discovered a significant determinant of X, and still be miles (or even light-years) away from a convincing explanation of why and when X occurs. This is not because of the difficulties of causal identification – we could have satisfied all conditions for causal inference from observational data, but the problem still stays. And it would not go away after we pay attention (as we should) to the fact that statistical significance is not the same as practical significance. Even the discovery of convincingly-identified causal effects, large enough to be of practical rather than only statistical significance, does not amount to explanation. A successful explanation needs to account for the variation in X, and causal associations need not to – they might be significant but not even make a visible dent in the unexplained variation in X. The difference I am talking about is partly akin to the difference between looking at the significance of individual regression coefficients and looking at the model fit as a whole (more on that will follow in Part II). The current standards of social science research tend to emphasize the former rather than the later which allows for significant relationships to be sold as explanations.

The objection can be made that the discovery of causal effects is all we should aim for, and all we could hope for. Even if a causal relationship doesn’t account for large amounts of variation in the outcome of interest, it still makes a difference.  After all, this is the approach taken in epidemiology, agricultural sciences and other fields (like beer production) where the statistical research paradigm has its origins. A pill might not treat all headaches but if it has a positive and statistically-significant effect, it will still help millions. But here is the trick – the quest for statistically significant relationships in epidemiology, agriculture, etc. is valuable because all these effects can be considered as interventions – the researchers have control over the formula of the pill, or the amount of pesticide, or the type of hops. In contrast, social science researchers too often seek and discover significant relationships between an outcome and variables that couldn’t even remotely be considered as interventions. So we end up with a pile of significant relationships which do not account for enough variation to count as a proper explanation and they have no value as interventions as their manipulation is beyond our reach. To sum up, observational social science has borrowed an approach to causality which makes sense for experimental research, and applied its standards (namely, statistical significance) to a context where the discovery of significant relationships is less valuable because the ‘treatments’ cannot be manipulated. Meanwhile, what should really count – explaining when, how and why a phenomenon happens, is relegated to the background in the false belief that somehow the quest for significant relationships is a substitute. It is like trying to discover the fundamental function of the lungs with epidemiological methods, and claiming success when you prove that cold air reduces significantly lung capacity. While the inference might still be valuable, it is no substitue for the original goal.

In Part II, I will discuss what needs to be changed, and what can be changed in the current practice of empirical social science research to address the problem outlined above.

*In my understanding, all explanation is causal. Hence, ‘causal explanation’ is tautology. Hence, I am gonna drop the ‘causal’ part for the rest of the text.