Facebook does randomized experiments to study social interactions

Facebook has a Data Science Team. And here is what they do:

Eytan Bakshy […] wanted to learn whether our actions on Facebook are mainly influenced by those of our close friends, who are likely to have similar tastes. […] So he messed with how Facebook operated for a quarter of a billion users. Over a seven-week period, the 76 million links that those users shared with each other were logged. Then, on 219 million randomly chosen occasions, Facebook prevented someone from seeing a link shared by a friend. Hiding links this way created a control group so that Bakshy could assess how often people end up promoting the same links because they have similar information sources and interests  [link to source at Technology Review].

It must be great (and a great challenge) to have access to all the data Facebook and use it to answer questions that are relevant not only for the immediate business objectives of the company. In the words of the Data Science Team leader:

“The biggest challenges Facebook has to solve are the same challenges that social science has.” Those challenges include understanding why some ideas or fashions spread from a few individuals to become universal and others don’t, or to what extent a person’s future actions are a product of past communication with friends.

Cool! These statements might make for a good discussion about the ethics of doing social science research inside and outside academica as well.

Google tries to find the funniest videos

Following my recent post on the project which tries to explain why some video clips go viral, here is a report on Google’s efforts to find the funniest videos:

You’d think the reasons for something being funny were beyond the reach of science – but Google’s brain-box researchers have managed to come up with a formula for working out which YouTube video clips are the funniest.

The Google researcher behind the project is quoted saying:

‘If a user uses an “loooooool” vs an “loool”, does it mean they were more amused? We designed features to quantify the degree of emphasis on words associated with amusement in viewer comments.’

Other factors taken into account are tags, descriptions, and ‘whether audible laughter can be heard in the background‘. Ultimately, the algorithm gives a ranking of the funniest videos  (with No No No No Cat on top, since you asked).

Now I usually have high respect for all things Google, but this ‘research’ at first appeared to be a total piece of junk. Of course, it turned out that it is just the way it is reported by the Daily Mail (cited above), New Scientist and countless other more or less reputable outlets.

Google’s new algorithm does not provide a normative ranking of the funniest videos ever based on some objective criteria; it is a predictive score about the video’s comedic potential. Google trained the algorithm on a bunch of videos (it’s unclear from the original source what the external ‘fun’ measure used for the training part was) in order to inductively extract features  associated with the video being funny. Based on these features, the program can then score any possible video. But these scores are not normative measures, they are predictions. So No No No No Cat is not the funniest video ever [well, it might be, it’s pretty hilarious actually], it is Google’s safest bet that the video would be considered funny.

The story is worth mentioning not only because it exposes yet another case of gross misinterpretation of a scientific project in the news, but because it nicely illustrates the differences between measurement, prediction, and explanation. The newspapers have taken Google’s project to be an exercise in measurement. As explained above, the goal is actually predictive in nature. But even if the algorithm has 100% success rate in identifying potentially funny videos, that would still not count as an explanation of what makes a video funny. Just think about it – would a boring video become funny if we just put funny tags, background laughter, and plenty of  loools in the comments? Not really. In that respect Brent Coker’s approach, which I mentioned in a previous post, has real explanatory potential (although I doubt whether it has any explanatory power).

So, no need to panic, the formula for something being funny is as distant as ever.

P.S. In an ironic turn of events, now that  No No No No Cat has gone viral, Google would never know whether the algorithm was very good, or just everyone wanted to see the video Google declared the funnies ever. Ah, the joys of social science research!

What makes a video go viral?

Internet Marketing expert Dr Brent Coker claims to have developed an algorithm that can predict which ad movies will go viral on YouTube. I don’t plan a career move to advertising but was nevertheless intrigued by the claim from a research methods & design perspective. Unfortunately, there is very little information available (yet?) and what information is available makes me a bit skeptical about the reliability of the conclusion. Still, Dr Coker’s approach might make for a nice discussion in the context of a Research Design course since it touches upon a question students can relate to, and raises various issues from operationalization to theory specification to theory testing.

In short, according to Dr Coker, “there are four elements that need to be in place for a branded movie to become viral: (1) congruency, (2) emotive strength, (3) network-involvement ratio, and (4) paired meme synergy”. Congruency is the consistency of the video’s theme with brand knowledge. Disgust and fear, for example, imply powerful emotive strength. The network-involvement ratio refers to how relevant the message is to the seeded network. The last element ‘paired meme synergy’ means that certain memes are effective when paired with certain other memes. “For example, impromptu entertainment acts appeared to work when paired with ‘Eyes Surprise’. When paired with ‘bubblegum nostalgia’, the … pair doesn’t work. Anticipation works with Voyeur, but not on its own. And so forth.”

As I said, there is not much information available on the research design, but from what I can gather, the predictive algorithm is based on an inductive approach: analyze movies that did go viral and see what their characteristics are. Such an approach would be OK to generate ideas, but one should be careful overselling the inductively-identified “solution” as a predictive algorithm which has been properly tested. An obvious next step would be to see whether the “solution” predicts outside the sample it was derived from, and maybe Dr Coker is working on that stage now. I wonder, however, whether the rather flexible definitions of some of the predictive elements make a testing of the approach feasible even in principle. It seem hard to identify the ‘network-involvement ration’, for example, prior to observing the outcome. The meme-pairing idea is interesting, but again: if there is no clear idea why certain memes should go together, there is a high risk of the analysis just playing catch-up with the data.

For example, how would you score this awesome recent viral video (my take would be Disruption Destruction + Performance + Skill Bill + Simulation Trigger, for the list of possible memes see here)?: 

P.S. On a somwhat related note: The Atlantic has a feature on the rise of big data which says that Google runs ”100-200 experiments on any given day, as they test new products and services, new algorithms and alternative designs”.