The origins of the digital universe

Just finished Turing’s Cathedral – a fine and stimulating book about the origins of the computer, the interlinked history of the first computers and nuclear bombs, the role of John von Neumann in all that, the Institute of Advanced Studies (IAS) in Princeton, and much more. It is a very thoroughly researched volume based on archival materials, interviews, etc. Actually, if I have one complaint it is that it is too scrupulous in presenting the background of all primary, secondary and tertiary characters in the story of the computer and in documenting the development of the various buildings at the IAS. For that reason I found the first part of the book a bit tedious. But the later chapters in which the author allows his own ideas about the digital universe to roam more freely are truly inspired and inspiring. It was also quite fascinating to learn that one of the first uses of the digital computer, apart from calculating nuclear fusion processes and trying to predict the weather, has been to run what would now be called agent-based modeling (by Nils Baricelli). Here is my favorite passage from the book:

‘Books are strings of code. But they have mysterious properties – like strings of DNA. Somehow the author captures a fragment of the universe, unravels it into a one-dimensional sequence, squeezes it through a keyhole, and hopes that a three-dimensional  vision emerges in the reader’s mind. The translation is never exact.’ (p.312)

After Google Reader

As you might have heard already, Google slashes Reader. That’s terrible news since I have based a very large part of my internet experience on Google Reader: not only following blogs, but collecting links, catching up with general news, and even keeping up to date with academic journals. For some insight into why Google dumped the Reader read here. There is a good discussion of alternatives RSS platforms here. Let’s hope a new and better feeder will soon appear to fill the gap.

Postscript: There is a petition to save Google Reader which already has close to 100 000 signatures here.

Post-postscript: And I thought I am pissed…

 

 

Intelligence and Politics

“Intelligence can let you solve harder problems, but some problems are just resistant, and you get to a point that being smarter isn’t going to help you at all, and I think a lot of our problems are like that. Like in politics – it’s not like we’re saying that if only we had a politician who was slightly smarter all our problems would go away.”

Peter Norvig, director of research in Google, Guardian

Explanation and the quest for ‘significant’ relationships. Part I

The ultimate goal of social science is causal explanation*. The actual goal of most academic research is to discover significant relationships between variables. The two goals are supposed to be strongly related – by discovering (the) significant effects of exogenous (independent) variables, one accounts for the outcome of interest. In fact, the working assumption of the empiricist paradigm of social science research is that the two goals are essentially the same – explanation is the sum of the significant effects that we have discovered. Just look at what all the academic articles with ‘explanation’, ‘determinants’, and ’causes’ in their titles do – they report significant effects, or associations, between variables.

The problem is that explanation and collecting significant associations are not the same. Of course they are not. The point is obvious to all uninitiated into the quantitative empiricist tradition of doing research, but seems to be lost to many of its practitioners. We could have discovered a significant determinant of X, and still be miles (or even light-years) away from a convincing explanation of why and when X occurs. This is not because of the difficulties of causal identification – we could have satisfied all conditions for causal inference from observational data, but the problem still stays. And it would not go away after we pay attention (as we should) to the fact that statistical significance is not the same as practical significance. Even the discovery of convincingly-identified causal effects, large enough to be of practical rather than only statistical significance, does not amount to explanation. A successful explanation needs to account for the variation in X, and causal associations need not to – they might be significant but not even make a visible dent in the unexplained variation in X. The difference I am talking about is partly akin to the difference between looking at the significance of individual regression coefficients and looking at the model fit as a whole (more on that will follow in Part II). The current standards of social science research tend to emphasize the former rather than the later which allows for significant relationships to be sold as explanations.

The objection can be made that the discovery of causal effects is all we should aim for, and all we could hope for. Even if a causal relationship doesn’t account for large amounts of variation in the outcome of interest, it still makes a difference.  After all, this is the approach taken in epidemiology, agricultural sciences and other fields (like beer production) where the statistical research paradigm has its origins. A pill might not treat all headaches but if it has a positive and statistically-significant effect, it will still help millions. But here is the trick – the quest for statistically significant relationships in epidemiology, agriculture, etc. is valuable because all these effects can be considered as interventions – the researchers have control over the formula of the pill, or the amount of pesticide, or the type of hops. In contrast, social science researchers too often seek and discover significant relationships between an outcome and variables that couldn’t even remotely be considered as interventions. So we end up with a pile of significant relationships which do not account for enough variation to count as a proper explanation and they have no value as interventions as their manipulation is beyond our reach. To sum up, observational social science has borrowed an approach to causality which makes sense for experimental research, and applied its standards (namely, statistical significance) to a context where the discovery of significant relationships is less valuable because the ‘treatments’ cannot be manipulated. Meanwhile, what should really count – explaining when, how and why a phenomenon happens, is relegated to the background in the false belief that somehow the quest for significant relationships is a substitute. It is like trying to discover the fundamental function of the lungs with epidemiological methods, and claiming success when you prove that cold air reduces significantly lung capacity. While the inference might still be valuable, it is no substitue for the original goal.

In Part II, I will discuss what needs to be changed, and what can be changed in the current practice of empirical social science research to address the problem outlined above.

*In my understanding, all explanation is causal. Hence, ‘causal explanation’ is tautology. Hence, I am gonna drop the ‘causal’ part for the rest of the text.