Buzzing about network graphs
December 4, 2011 2:07 AM   Subscribe

A hive plot (slides) is a beautiful and compelling way to visualize multiple, complex networks, without resorting to "hairball" graphs that are often difficult to qualitatively compare and contrast.

Hive plots were conceived by Martin Krzywinski, the primary author of the Circos software package, used to represent genomic and other data that render well in circular form.

To make your own hive plots, take a look at Krzywinski's linnet library, or if you like R, there's HiveR.
posted by Blazecock Pileon (14 comments total) 34 users marked this as a favorite
 
Farewell to hairballs!

This is the first time my cat has ever shown any interest in Metafilter.
posted by twoleftfeet at 2:50 AM on December 4, 2011


I have a feeling your cat understands this at least as well as I do.
posted by Segundus at 3:14 AM on December 4, 2011 [6 favorites]


Beautiful website, I like the Tribble photo. Now can someone please explain them to me?
posted by arcticseal at 3:44 AM on December 4, 2011


All I was able to glean from this, being of the arts/hum type, was "The Lying Hairball", which rather amused me (and my cat).

I'm also familiar with the Trickster Hairball: it's when you hear hurking in the middle of the night, are too lazy to get out of bed to investigate, and you can't find it before you leave for work in the morning.
posted by zinful at 4:34 AM on December 4, 2011


Critically, there is no aesthetic magic sauce added to the layout. If the layout shows a pattern, you can be sure it is due to structure in the underlying data and not on the layout algorithm's interpretation of how the data should best be shown.

This is never true for any visualization algorithm.

Overall, I have to say I'm failing to grasp why I would prefer this visualization to the more customary ones. It certainly makes no attempt to show community structure (i.e to find small subsets of the network which are unusually highly connected.) Now that might not be what you're interested in -- but it's what a lot of people who work on networks are interested in, and it's exactly what "hairball" methods are trying to capture; so it's sort of weird to criticize hairballs when you're not even trying to do the thing that they're doing.

The hive plot is good at giving a picture of degree distibution -- but you could also, you know, just do a good old bar graphs of degree distribution!.

And if your data set doesn't have a pre-determined division of the nodes into a few separate classes, I don't see what they're doing other than forcing the nodes onto a line segment and ordering them by degree, which seems like a terrible idea. And lots of times (as in social networks) you really don't have a division of the nodes; there's only "one kind of thing" in your network.

This is the first time I've seen this, so it's certiainly possible I've missed something important about it.
posted by escabeche at 6:14 AM on December 4, 2011 [5 favorites]


arcticseal: Sure, it's new to me but I'll give it a go.

A graph in this context means a sort of network, where A links to B, B links to C and D, E links to F and G, and G links back to, say, B. There's a picture of a graph on Wikipedia. The letters are vertices, or nodes; the links are called edges, or lines.

You can represent the whole Internet--all the servers and routers--as such a graph, or all the links between pages on the web, and lots of structures in genomics are represented this way. Org charts, too. There are applications for this data almost everywhere. Thus there's a desire to see what all of those links look like, and so there has grown up a whole set of approaches for visualizing it and lots of tools offer the functionality (like old trust graphviz or gephi).

Once you get above a certain number of nodes and links (in my experience more than a few hundred, which in data terms is unbelievably small) it gets to be really painful to visualize and you end up with a "hairball" that looks cool--but the ratio of signal to noise is pretty low (esp. if they get above 20,000 nodes or so; then it's really hard to make these work without doing a lot of tricks).

So there's all sorts of energy being directed into forcing these graphs to be faster and bigger--the Associated Press, for example, has a Knight-foundation funded project to visualize the relationships between thousands, or millions, of documents--but the results are still pretty blurry compared to, say, a bar chart.

Back to hive plots: This person Martin Krzywinski, who is a professional data visualizer (and does it for high stakes--he works in a cancer research lab) apparently got fed up with the hairballs after years of making them, and said something like: Look, what we're looking for is more specific; I frankly don't care as much about the nodes themselves but I do care about the number and kinds of connections between different types of nodes and these hairballs kind of show that with color and clustering and so forth but come on. And that is a very fair thing to say about hairballs.

His reasoning seems to be: Rather than make a hairball with a zillion dots all connected, just give up and group the nodes by some criteria and put them along a line, thus basically forgetting that there are individual nodes there at all, then show the connections between different nodes by drawing curved lines between them--and we'll be able to see how intense those connections are. Instead of looking for the smaller hairballs inside the larger hairball we just look for dark or colored patches where there are lots of curved lines connecting between the straight lines and go "communication is happening there, I wonder what that means?"

Krzywinski is arguing his case pretty hard, with far more rhetorical force and more humor than you usually see when someone is promoting a new data visualization (it reminds me of when people got excited about sparklines). Which is fun. If you read their slideshow the criteria for creating the lines are spelled out a little more clearly--it's less of a manifesto. I think that might address some of escabeche's concerns--there's definitely a lot of reasoning behind this and some very specific goals (and whether it is drastically better than "good old bar graphs of degree distribution," which are pretty useful, probably depends on a number of factors related to the data set).
posted by ftrain at 7:18 AM on December 4, 2011 [5 favorites]


I actually think the best, and probably the clearest way to show relationship data is with an Adjacency Matrix. If you order the items well is actually pretty easy to clusters and sub-groups.

So for example here is a matrix I did for a bunch of websites (I think about 1500?). You can see closely associated clusters as large blocks along the diagonal, and you can also see the interaction between blocks by looking at the area that's horizontally aligned with one block and vertically aligned with another. Here's a zoomed in version that shows the labels.
Critically, there is no aesthetic magic sauce added to the layout. If the layout shows a pattern, you can be sure it is due to structure in the underlying data and not on the layout algorithm's interpretation of how the data should best be shown.
This is never true for any visualization algorithm. Yeah I don't think there is any magical way that you can find structures with just an un-ordered graph. In the examples you can see 'macro' links between different clusters, I guess but that's probably only because they're already grouped spatially. If you just had nodes in random order, you probably wouldn't see anything pop out for large graphs, I don't think.
posted by delmoi at 7:32 AM on December 4, 2011 [1 favorite]


Also, Adjacency matrices are really easy to read for smaller data sets as well, IMO. I mean If you actually need to know the links between individual connections (if you were building an electrical circuit, or something) you can look them up in detail without worrying about your eye getting lost trying to follow an arc.
posted by delmoi at 7:35 AM on December 4, 2011


But here's the thing: unless they start teaching this kind of visualization in undergrad, I'm going to have to spend 5-6 slides explaining how this thing works before I can even start to present my data. In a typical 20 minute presentation, I've just lost a quarter of my allotted time (and audience attention). The hairball network, for all its faults, is at least intuitive and universal.
posted by reformedjerk at 9:35 AM on December 4, 2011 [1 favorite]


For large networks, the hive plots didn't reveal any extra information over the hairballs. They just look a bit nicer. I'm with Mr adjacency matrix.
posted by FrereKhan at 12:47 PM on December 4, 2011


ftrain - thanks, that's the explanation I needed.
posted by arcticseal at 7:00 PM on December 4, 2011


Neat! It is taking us a while to get out of line-printer/flatland mode (not a surprise to Ted Nelson I'd guess).

I keep expecting to see something similar happening to representations of data structures, even whole disciplines (complete with zooms into actual content), but I guess we're still mostly in fascinated-by-blips-on-a-CRT mode.
posted by Twang at 7:21 PM on December 4, 2011


I love the idea, but the page authors come across as total jerks.
posted by Theta States at 9:30 AM on December 5, 2011


But here's the thing: unless they start teaching this kind of visualization in undergrad, I'm going to have to spend 5-6 slides explaining how this thing works before I can even start to present my data. In a typical 20 minute presentation, I've just lost a quarter of my allotted time (and audience attention). The hairball network, for all its faults, is at least intuitive and universal.

+1.

This is an elegant concept that may work extremely well for advanced network analysis by scientists, but out of context it's hard to translate into something the rest of us can understand.

I've been convinced for a long time that the most powerful new data tools aren't visual, they're interactive. Data never exists in a vacuum, it always needs human intervention to make sense of it, and the very best tools make way for sorting, filtering and other manipulations in the appropriate context. Social network analysis tools like gephr are very advanced examples of this, where output requires significant wrangling to be human readable - or, what escabeche said.

Gorgeous as this is, I doubt that the true value of this approach can be realized unless it can be coerced into a conceptual model which can be used to navigate those pesky hairballs in ways that make abstractive sense. I don't know if that's possible, but I'd love to see it perform if it were.
posted by Elizabeth the Thirteenth at 10:43 AM on December 5, 2011


« Older "Science writing tackles big ideas, important...   |   Howard Tate, soul man, 1939-2011 Newer »


This thread has been archived and is closed to new comments