When ‘just looking’ beats regression

In a draft paper currently under review I argue that the institutionalization of a common EU asylum policy has not led to a race to the bottom with respect to asylum applications, refugee status grants, and some other indicators. The graph below traces the number of asylum applications lodged in 29 European countries since 1997:

My conclusion is that there is no evidence in support of the theoretical expectation of a race to the bottom (an ever-declining rate of registered applications). One of the reviewers insists that I use a regression model to quantify the change and to estimate the uncertainly of the conclusion. While in general I couldn’t agree more that being open about the uncertainty of your inferences is a fundamental part of scientific practice, in this particular case I refused to fit a regression model and calculate standards errors or confidence intervals. Why?

In my opinion, just looking at the graph is convincing that there is no race to the bottom – applications rates have been down and then up again while the institutionalization of a common EU policy has only strengthened over the last decade. Calculating standard errors will be superficial because it is hard to think about the yearly averages as samples from some underlying population. Estimating a regression which would quantify the EU effect would only work if the model is sufficiently good to capture the fundamental dynamics of asylum applications before isolating the EU effect, and there is no such model. But most importantly, I just didn’t feel that a regression coefficient or a standard error will improve on the inference you get by just looking at the graph: applications have been all over the place since the late 1990s and you don’t need a confidence interval to see that! But the issue has bugged me ever since – after all, the reviewer was just asking for what would be the standard way of approaching an empirical question.

Then two days ago I read this blog post by William M. Briggs who (unlike myself) is a professional statistician. After showing that by manipulating the start and end points of a time series you can get any regression coefficient that you want even with randomly generated data, he concludes ‘The lesson is, of course, that straight lines should not be fit to time series.’  But here is the real punch line:

If we want to know if there has been a change from the start to the end dates, all we have to do is look! I’m tempted to add a dozen more exclamation points to that sentence, it is that important. We do not have to model what we can see. No statistical test is needed to say whether the data has changed. We can just look.

But what about hypothesis testing? We need a statistical test to refute a hypothesis, right? Let me quote some more:

It is true that you can look at the data and ponder a “null hypothesis” of “no change” and then fit a model to kill off this straw man. But why? If the model you fit is any good, it will be able to skillfully predict new data…. And if it’s a bad model, why clutter up the picture with spurious, misleading lines?

In the inimitable prose of Prof. Briggs, ‘if you want to claim that the data has gone up, down, did a swirl, or any other damn thing, just look at it!’

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s