The Geographic Flow of Music
April 26, 2012 8:37 AM   Subscribe

In The Geographic Flow of Music (arxiv), researchers Conrad Lee and Pádraig Cunningham propose a method to use data from the last.fm API to track the world's listening habits by location and time, showing where shifts in musical tastes have originated and subsequently migrated. Results show music trends originating in smaller cities and flowing outward in unexpected ways, contradicting some assumptions in social science about larger cities being more efficient engines of (cultural) invention.
posted by Blazecock Pileon (13 comments total) 24 users marked this as a favorite

 
I look forward to a wealth of slick infographics about drone ambient.
posted by Theta States at 8:57 AM on April 26, 2012 [1 favorite]


Woah. That's a fascinating idea.
posted by delmoi at 9:16 AM on April 26, 2012


Results show music trends originating in smaller cities and flowing outward in unexpected ways, contradicting some assumptions in social science about larger cities being more efficient engines of (cultural) invention.

This seems confused. They are examining listening patterns, not music creation patterns, right? And even if they are looking at where someone uploaded tracks from how do we know the source of "cultural invention" was that same location?
posted by DU at 9:27 AM on April 26, 2012


More recently, detailed logs of digital communication have enabled these hypotheses to be tested on datasets that are much larger than was feasible only a decade earlier. For example, in [7], Bakshy et al. subject 250 million facebook users to a controlled experiment in order to measure the role that facebook friends play in influencing the diffusion of information...
Uh... did these people know this was going on, I wonder? here is the paper (ref 7) if anyone wants to look at it.
finding that while a user’s most active relationships are individually the most influential, the overall effect of less active relationships in spreading novel information is stronger. Additional examples of recent significant work includes the worldwide spread of e-mail chain letters [8], the analysis of a massive worldwide instant messaging dataset [9], and the spread of information through the blogosphere [10], [11].
That's... somewhat disquieting. The paper itself (ref 9) is paywalled off, but here is the abstract:
We present a study of anonymized data capturing a month of high-level communication activities within the whole of the Microsoft Messenger instant-messaging system. We examine characteristics and patterns that emerge from the collective dynamics of large numbers of people, rather than the actions and characteristics of individuals. The dataset contains summary properties of 30 billion conversations among 240 million people. From the data, we construct a communication graph with 180 million nodes and 1.3 billion undirected edges, creating the largest social network constructed and analyzed to date. We report on multiple aspects of the dataset and synthesized graph. We find that the graph is well-connected and robust to node removal. We investigate on a planetary-scale the oft-cited report that people are separated by "six degrees of separation" and find that the average path length among Messenger users is 6.6. We find that people tend to communicate more with each other when they have similar age, language, and location, and that cross-gender conversations are both more frequent and of longer duration than conversations with the same gender.
If this is the stuff they're handing out to academics, imagine what these companies are looking at internally. You could gather far more information if you looked at the text of the conversations themselves, and when with Google messenger or whatever - the chats are all stored 'in the cloud' along with the rest of your gmail, and presumably governed by the same privacy policy - There's no reason why Google couldn't mine your conversations in order to better target ads across properties.
posted by delmoi at 9:28 AM on April 26, 2012


Good paper, I think! Makes modest yet interesting claims, close to the data, which you could imagine testing on other datasets. I would have liked to see them compute their influence network using half the dataset and cross-check on the rest. And I quibble with

"The work on scaling laws in cities which we have summarized in this section is significant because it appears to have uncovered a universal law in the social sciences, one which can make quantitative predictions. Our point here is not to claim that our results contradict this law."

Hardly anyone would say there is anything like a "universal law" that can predict how musically influential a city should be based only on its population.
posted by escabeche at 9:29 AM on April 26, 2012


Wow, Okay I'm reading Ref 7 The Role of Social Networks in Information Diffusion.

I was wondering how an experiment to analyze the data of 250 million people would pass review at an institution - but it turns out the researchers were Facebook employees, not academics (except one person at the university of Michigan). So there was no opt-in.
The experimental population consists of a random sample of all Facebook users who visited the site between August 14th to October 4th 2010, and had at least one friend sharing a link. At the time of the experiment, there were approximately 500 million Facebook users logging in at least once a month. Our sample consists of pproximately 253 million of these users. All Facebook users report their age and gender, and a user's country of residence can be inferred from the IP address with which she accesses the site. In our sample, the median and average age of subjects is 26 and 29.3, respectively. Subjects originate from 236 countries and territories, 44 of which have one million or more subjects.
Not only that, the researchers actually manipulated people's feeds, actively censoring certain URLs
Subject-URL pairs are randomly assigned at the time of display to either the no feed or the feed condition. Stories that contain links to a URL assigned to the no feed condition for the subject are never displayed in the subject's feed. Those assigned to the feed condition are not removed from the feed, and appear in the subject's feed as normal (Figure 2). Pairs are deterministically assigned to a condition at the time of display, so any subsequent share of the same URL by any of a subject's friends is also assigned to the same condition.
Supposedly only about 1% of URLs were blocked. But still.
posted by delmoi at 9:42 AM on April 26, 2012 [1 favorite]


Interesting. I would like to point out that in their Fig. 1 diagram they include Bogotá under México. They must surely mean Mexico City. Right?
posted by elmono at 10:02 AM on April 26, 2012


This seems confused. They are examining listening patterns, not music creation patterns, right? And even if they are looking at where someone uploaded tracks from how do we know the source of "cultural invention" was that same location?

That does seem like an issue with regards to their desire to make claims about where and how cultural innovation takes place, but figuring out a relationship between the geographic location of a new musical act and the geographic location of the first uploader is a sticky enough topic to merit its own study. It seems like a harder project to get data for, too. The nice thing, though, is that the findings from this project are nicely modular. Right now it seems reasonable to interpret this data with the assumption that the first uploaders of an act belong to the same community as that act, but you could easily swap that assumption out for whatever conclusion our hypothetical study of uploaders generates.

What I would really love is if we had some machine learning methods that were smart enough to pick out (to a pretty detailed degree) the common musical features of a given corpus. Then we could do some clustering analysis and see the big picture of how musical styles mutate in transmission. Hot damn but that data would be a musicological Holy Grail...
posted by invitapriore at 10:32 AM on April 26, 2012


invitapriore, have you seen the Million Song Dataset?
posted by escabeche at 10:35 AM on April 26, 2012 [1 favorite]


No, but thank you for making my day.
posted by invitapriore at 10:40 AM on April 26, 2012


figuring out a relationship between the geographic location of a new musical act and the geographic location of the first uploader is a sticky enough topic

That's a good point but not what I meant. I mean scenarios like the following: "I live in Peoria and on a recent trip to Chicago I was inspired to write this song" or "I was watching those folks on TV and I thought I could do something like that, so here it is".
posted by DU at 10:41 AM on April 26, 2012


Oh, I see. Well, I'm personally inclined to downplay the relevance of scenarios like that on the assumption that such individual events would be less predictive of a musician's stylistic preferences than the aggregate of their life experience, but we're getting into kind of fuzzy territory. I do think you're onto something with the notion of copycat acts, which this data can't say much about without a more granular genre scheme.
posted by invitapriore at 10:57 AM on April 26, 2012


The more general problem is to what extent people are culturally members of their geographic communities. TV and internet have basically destroyed that concept.
posted by DU at 11:11 AM on April 26, 2012


« Older Why are we striking? Why should you strike? And, w...  |  musicForProgramming(); a serie... Newer »


This thread has been archived and is closed to new comments