Music Network Visualization

Note: probably of interest only to the intersection of the readers who are into niche music genres and those interested in network visualization.

My music interests have always been rather, hmm…, eclectic. Somehow IDM, ambient, darkwave, triphop, acid jazz, bossa nova, qawali, Mali blues and other more or less obscure genres have managed to happily co-exist in my music collection. The sheer diversity always invited the question whether there is some structure to the collection, or each genre is an island of its own. Sounds like a job for network visualization!

Now, there are plenty of music network viz applications on the web. But they don’t show my collection, and just seem unsatisfactory for various reasons. So I decided to craft my own visualization using R and igraph.

As a first step I collected for all artists in my last.fm library the artists that the site classifies as similar. So I piggyback on last.fm for the network similarity measures. I also get info on the most-often used tag for the artist and the number of plays it has on the site. The rest is pretty straightforward as can be seen from the code.

# Load the igraph and foreign packages (install if needed)
require(igraph)
require(foreign)
lastfm<-read.csv("http://www.dimiter.eu/Data_files/lastfm_network_ad.csv", header=T,  encoding="UTF-8") #Load the dataset

lastfm$include<-ifelse(lastfm$Similar %in% lastfm$Artist==T,1,0) #Index the links between artists in the library
lastfm.network<-graph.data.frame(lastfm, directed=F) #Import as a graph

last.attr<-lastfm[-which(duplicated(lastfm$Artist)),c(5,3,4) ] #Create some attributes
V(lastfm.network)[1:106]$listeners<-last.attr[,2]
V(lastfm.network)[107:length(V(lastfm.network))]$listeners<-NA
V(lastfm.network)[1:106]$tag<-last.attr[,3]
V(lastfm.network)[107:length(V(lastfm.network))]$tag<-NA #Attach the attributes to the artist from the library (only)
V(lastfm.network)$label.cex$tag<-ifelse(V(lastfm.network)$listeners>1200000, 1.4, 
                                    (ifelse(V(lastfm.network)$listeners>500000, 1.2,
                                            (ifelse(V(lastfm.network)$listeners>100000, 1.1,
                                                   (ifelse(V(lastfm.network)$listeners>50000, 1, 0.8))))))) #Scale the size of labels by the relative popularity

V(lastfm.network)$color<-"white" #Set the color of the dots
V(lastfm.network)$size<-0.1 #Set the size of the dots
V(lastfm.network)$label.color<-NA
V(lastfm.network)[1:106]$label.color<-"white" #Only the artists from the library should be in white, the rest are not needed

E(lastfm.network)[ include==0 ]$color<-"black" 
E(lastfm.network)[ include==1 ]$color<-"red" #Color edges between artists in the library red, the rest are not needed

fix(tkplot) #Add manually to the function an argument for the background color of the canvas and set it to black (bg=black)

tkplot(lastfm.network, vertex.label=V(lastfm.network)$name, layout=layout.fruchterman.reingold,
       canvas.width=1200, canvas.height=800) #Plot the graph and adjust as needed

I plot the network with the tkplot command which allows for the manual adjustments necessary because many artist names get on top of each other in the initial plot. Because the export options of tkplot are limited I just took a print screen ( I know, I know, that’s kind of cheating ;-)), added the tittle in Photoshop and, voila, it’s done!

[click to enlarge and explore]
my-music-netowrk

Knowing intimately the artists in the graph, I can certify that the network definitely makes a lot of sense. I love the small clusters (Flying Louts, Andy Stott, Extrawelt and Claro Intelecto [minimal/dub], or Anouar Brahem and Rabih Abou-Khalil [ethno jazz]) loosely connected to the rest of the network. And I love the fact that the boundary spanners are immediately obvious (e.g. Pink Martini between acid jazz and world music [what a stupid label by the way!], or Cesaria Evora between African and Caribbean music, or Portishead between brit-pop, trip-hop and darkwave, or Amon Tobin between trip-hop, electro and IDM). Even the different world music genres are close to each other but still unconnected. And somehow Banco De Gaya, the most ethno of all electronica in the library, ended up closest to the world/ethno clusters. There are a few problems, like Depeche Mode, which get to be pulled from the opposite sides of the graph, but these are very few.

Altogether, I have to admit I feel like a teenage dream of mine has finally been realized. But I realize the network is a rather personal thing (as it was meant to be) so I don’t expect many to get overly excited about it. Still, I would be glad to hear your comments or suggestions for extensions and improvements. And, if you were a good boy/girl during the year, I could also consider visualizing your last.fm network as a present for the new year!

Network visualization in R with the igraph package

In this post I showed a visualization of the organizational network of my department. Since several people asked for details how the plot has been produced, I will provide the code and some extensions below. The plot has been done entirely in R (2.14.01) with the help of the igraph package. It is a great package but I found the documentation somewhat difficult to use, so hopefully this post can be a helpful introduction to network visualization with R. Here we go:

# Load the igraph package (install if needed)

require(igraph)

# Data format. The data is in 'edges' format meaning that each row records a relationship (edge) between two people (vertices).
# Additional attributes can be included. Here is an example:
#	Supervisor	Examiner	Grade	Spec(ialization)
#	AA		BD		6	X	
#	BD		CA		8	Y
#	AA		DE		7	Y
#	...		...		...	...
# In this anonymized example, we have data on co-supervision with additional information about grades and specialization. 
# It is also possible to have the data in a matrix form (see the igraph documentation for details)

# Load the data. The data needs to be loaded as a table first: 

bsk<-read.table("http://www.dimiter.eu/Data_files/edgesdata3.txt", sep='\t', dec=',', header=T)#specify the path, separator(tab, comma, ...), decimal point symbol, etc.

# Transform the table into the required graph format:
bsk.network<-graph.data.frame(bsk, directed=F) #the 'directed' attribute specifies whether the edges are directed
# or equivelent irrespective of the position (1st vs 2nd column). For directed graphs use 'directed=T'

# Inspect the data:

V(bsk.network) #prints the list of vertices (people)
E(bsk.network) #prints the list of edges (relationships)
degree(bsk.network) #print the number of edges per vertex (relationships per people)

# First try. We can plot the graph right away but the results will usually be unsatisfactory:
plot(bsk.network)

Here is the result:

Not very informative indeed. Let’s go on:

 
#Subset the data. If we want to exclude people who are in the network only tangentially (participate in one or two relationships only)
# we can exclude the by subsetting the graph on the basis of the 'degree':

bad.vs<-V(bsk.network)[degree(bsk.network)<3] #identify those vertices part of less than three edges
bsk.network<-delete.vertices(bsk.network, bad.vs) #exclude them from the graph

# Plot the data.Some details about the graph can be specified in advance.
# For example we can separate some vertices (people) by color:

V(bsk.network)$color<-ifelse(V(bsk.network)$name=='CA', 'blue', 'red') #useful for highlighting certain people. Works by matching the name attribute of the vertex to the one specified in the 'ifelse' expression

# We can also color the connecting edges differently depending on the 'grade': 

E(bsk.network)$color<-ifelse(E(bsk.network)$grade==9, "red", "grey")

# or depending on the different specialization ('spec'):

E(bsk.network)$color<-ifelse(E(bsk.network)$spec=='X', "red", ifelse(E(bsk.network)$spec=='Y', "blue", "grey"))

# Note: the example uses nested ifelse expressions which is in general a bad idea but does the job in this case
# Additional attributes like size can be further specified in an analogous manner, either in advance or when the plot function is called:

V(bsk.network)$size<-degree(bsk.network)/10#here the size of the vertices is specified by the degree of the vertex, so that people supervising more have get proportionally bigger dots. Getting the right scale gets some playing around with the parameters of the scale function (from the 'base' package)

# Note that if the same attribute is specified beforehand and inside the function, the former will be overridden.
# And finally the plot itself:
par(mai=c(0,0,1,0)) 			#this specifies the size of the margins. the default settings leave too much free space on all sides (if no axes are printed)
plot(bsk.network,				#the graph to be plotted
layout=layout.fruchterman.reingold,	# the layout method. see the igraph documentation for details
main='Organizational network example',	#specifies the title
vertex.label.dist=0.5,			#puts the name labels slightly off the dots
vertex.frame.color='blue', 		#the color of the border of the dots 
vertex.label.color='black',		#the color of the name labels
vertex.label.font=2,			#the font of the name labels
vertex.label=V(bsk.network)$name,		#specifies the lables of the vertices. in this case the 'name' attribute is used
vertex.label.cex=1			#specifies the size of the font of the labels. can also be made to vary
)

# Save and export the plot. The plot can be copied as a metafile to the clipboard, or it can be saved as a pdf or png (and other formats).
# For example, we can save it as a png:
png(filename="org_network.png", height=800, width=600) #call the png writer
#run the plot
dev.off() #dont forget to close the device
#And that's the end for now.

Here is the result:

Still not perfect, but much more informative and aesthetically pleasing.

Additional information can be found on this guide to igraph which is in development, the examples here, and the official CRAN documentation of the package. Especially useful is this list of the plot attributes that can be tweaked. The plots can also be adjusted interactively using the tkplot function instead of plot, but the options for saving the resulting figure are limited.

Have fun with your networks!

The hidden structure of (academic) organizations

All organizations have a ‘deep’ hidden structure based on the social interactions among its members which might or might not coincide with the official formal one. University departments are no exception – if anything, the informal alliances, affinities, and allegiances within academic departments are only too visible and salient.

Network analysis provides one way of visualizing and exploring the ‘deep’ organizational structure. In order to learn how to visualize small networks with R, I collected data on the social interactions within my own department and plugged the dataset in R (igraph package) to get the plot below. The figure shows the social network of my institute based on the co-supervision of student dissertations (each Master thesis has a supervisor who selects a so-called ‘second’ reader who reviews the draft and the two supervisors examine the student during the defence). So each link between nodes (people) is based on one joint supervision of a student. The total number of links (edges) is 264 which covers (approximately) all dissertations defended over the last year. In this version of the graph, the people are represented only by numbers but in the full version the actual names of people are plotted, the links are directional, and additional info (like the grade of the thesis) can be incorporated.

Altogether, the organization appears surprisingly well-integrated. Most ‘outsiders’ and most weakly-connected ‘islands’ are either occasional external readers, or new colleagues being ‘socialized’ into the organization. Obviously, some people are more ‘central’ in the sense of connecting to a more diverse set of people, while others serve as boundary-spanners reaching to people who would otherwise remain unconnected to the core.  I find the figure intellectually and aesthetically pleasing (given that it is generated with two lines of code) and perhaps a more thorough analysis of the network can be useful in organizational management as well.

Facebook does randomized experiments to study social interactions

Facebook has a Data Science Team. And here is what they do:

Eytan Bakshy [...] wanted to learn whether our actions on Facebook are mainly influenced by those of our close friends, who are likely to have similar tastes. [...] So he messed with how Facebook operated for a quarter of a billion users. Over a seven-week period, the 76 million links that those users shared with each other were logged. Then, on 219 million randomly chosen occasions, Facebook prevented someone from seeing a link shared by a friend. Hiding links this way created a control group so that Bakshy could assess how often people end up promoting the same links because they have similar information sources and interests  [link to source at Technology Review].

It must be great (and a great challenge) to have access to all the data Facebook and use it to answer questions that are relevant not only for the immediate business objectives of the company. In the words of the Data Science Team leader:

“The biggest challenges Facebook has to solve are the same challenges that social science has.” Those challenges include understanding why some ideas or fashions spread from a few individuals to become universal and others don’t, or to what extent a person’s future actions are a product of past communication with friends.

Cool! These statements might make for a good discussion about the ethics of doing social science research inside and outside academica as well.

New tool for discourse network analysis

EJPR has just published an article introducing a new tool for ‘discourse network analysis’. Using the tool, you can measure and visualize political discourses and the networks of actors affiliated to each discourse. One can study the actor congruence networks (based on the number of statements actors share), concept congruence networks (based on whether statements are used by an actor in the same way) and trace the evolution of both over time.

Here is a graph taken from the paper which illustrates the actor congruence networks for the issue of software patents in the EU (click to enlarge):

The discourse networks analysis tool is free and available from the website of Philip Leifeld, one of the co-authors of the article. I can’t wait to get my hands on the program and try it out for myself. The tool promises to be an interesting alternative to evolutionary factor analysis – another new method for studying policy frames and discourses that I recently discussed – with the added benefit of being able to present actors and frames in an integrated analysis.  

Here is the abstract of the EJPR article (there are more resources at this website):

In 2005, the European Parliament rejected the directive ‘on the patentability of computer-implemented inventions’, which had been drafted and supported by the European Commission, the Council and well-organised industrial interests, with an overwhelming majority. In this unusual case, a coalition of opponents of software patents prevailed over a strong industry-led coalition. In this article, an explanation is developed based on political discourse showing that two stable and distinct discourse coalitions can be identified and measured over time. The apparently weak coalition of software patent opponents shows typical properties of a hegemonic discourse coalition. It presents itself as being more coherent, employs a better-integrated set of frames and dominates key economic arguments, while the proponents of software patents are not as well-organised. This configuration of the discourse gave leeway for an alternative course of political action by the European Parliament. The notion of discourse coalitions and related structural features of the discourse are operationalised by drawing on social network analysis. More specifically, discourse network analysis is introduced as a new methodology for the study of policy debates. The approach is capable of measuring empirical discourses both statically and in a longitudinal way, and is compatible with the policy network approach.

Predicting the votes of judges

Here is a (short) and interesting paper that uses an innovative approach to predict the votes of the US Supreme Court:

Successful attempts to predict judges’ votes shed light into how legal decisions are made and, ultimately, into the behavior and evolution of the judiciary. Here, we investigate to what extent it is possible to make predictions of a justice’s vote based on the other justices’ votes in the same case. For our predictions, we use models and methods that have been developed to uncover hidden associations between actors in complex social networks. We show that these methods are more accurate at predicting justice’s votes than forecasts made by legal experts and by algorithms that take into consideration the content of the cases. We argue that, within our framework, high predictability is a quantitative proxy for stable justice (and case) blocks, which probably reflect stable a priori attitudes toward the law. We find that U.S. Supreme Court justice votes are more predictable than one would expect from an ideal court composed of perfectly independent justices. Deviations from ideal behavior are most apparent in divided 5–4 decisions, where justice blocks seem to be most stable. Moreover, we find evidence that justice predictability decreased during the 50-year period spanning from the Warren Court to the Rehnquist Court, and that aggregate court predictability has been significantly lower during Democratic presidencies. More broadly, our results show that it is possible to use methods developed for the analysis of complex social networks to quantitatively investigate historical questions related to political decision-making.

While I have my reservations whether “trying to predict the behavior of judges, one can get insights into how legal decisions are truly made”, exercises in predicting outcomes are interesting in their own right. And this paper appears to hit the target: its predictive success rate is 83% vs. the less-than-70% success rate of existing approaches based on expert opinions and statistical models of case characteristics. Note however that each individual vote is predicted with information about how the other judges have voted on that same case which, if the votes are announced simultaneously, doesn’t provide you with any leverage in actually predicting the outcome of a case.

P.S. What is this penchant that the real scientific journals (e.g.PLoS) have for social science research based on agent-based modeling or network theory?

Concentration of control in the global economy

All conspiracy theorists know that the global economy is concentrated in the hands of a few. But even they will be blown away by this paper which maps the network of global corporate ownership and control. Here is the (somewhat understated) abstract:

“The structure of the control network of transnational corporations affects global market competition and financial stability. So far, only small national samples were studied and there was no appropriate methodology to assess control globally. We present the first investigation of the architecture of the international ownership network, along with the computation of the control held by each global player. We find that transnational corporations form a giant bow-tie structure and that a large portion of control flows to a small tightly-knit core of financial institutions. This core can be seen as an economic “super-entity” that raises new important issues both for researchers and policy makers.”  (Vitali, Glattfelder and Battiston)

Some of the findings:
- almost 40% of the economic value of transnational companies in the world is in the hands of a group of 147 tightly-interconnected companies “which has almost full control over itself” (p.6)
- “[N]etwork control is much more unequally distributed than wealth…[T]he top ranked actors hold a control ten times bigger than what could be expected based on their wealth” (p.6)
- 10 companies control 20% of the network; 50 companies control 40% of the network (!)
35 of these 50 companies belong to a strongly connected core, meaning that they are all “tied together in an extremely entangled web of control” through co-ownerships (p.32)
- 77% (463 006) of the firms in the entire network belong to a single connected component [formally, in a connected component all firms can reach each other along the paths of the network]. The second largest connected component has only 230 firms.  

Here is the map of the core of the core of the network itself; not very informative as such but beautiful nonetheless:  (Superconnected companies are red, very connected companies are yellow)


(Image: PLoS One, via New Scientist)
 
This is the first paper to include indirect and weighted control paths in constructing the global economy network and it introduces a new method for measuring control that is suitable for such complex networks. Although quite technical, the paper does a remarkable job of walking the reader step-by-step through the analysis. New Scientist has a less-technical presentation of the research here.

The implications of this work for the stability of the economy and competition should be quite obvious, but the authors (all from ETH Zurich) also explicitly discuss them in the paper. One can only hope that economic policy makers and politicians take note.