<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Neo4j traverses depths of 1000 levels and beyond at millisecond speed</title>
	<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed/</link>
	<description>Comments on MetaFilter post Neo4j traverses depths of 1000 levels and beyond at millisecond speed</description>
	<pubDate>Mon, 15 Jun 2009 09:02:20 -0800</pubDate>
	<lastBuildDate>Mon, 15 Jun 2009 09:02:20 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Neo4j traverses depths of 1000 levels and beyond at millisecond speed</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed</link>	
		<description>&lt;a href="http://highscalability.com/neo4j-graph-database-kicks-buttox"&gt;Graph databases&lt;/a&gt; - &lt;a href=&quot;http://gigaom.com/2007/08/10/data-20-how-the-web-disrupts-our-relational-database-world/&quot;&gt;data 2.0&lt;/a&gt; for &lt;a href=&quot;http://www.readwriteweb.com/archives/social_graph_tim_berners-lee.php&quot;&gt;Web 3.0&lt;/a&gt;?</description>
		<guid isPermaLink="false">post:www.metafilter.com,2009:site.82478</guid>
		<pubDate>Mon, 15 Jun 2009 08:50:25 -0800</pubDate>
		<dc:creator>dabitch</dc:creator>		<category>social</category>		<category>networks</category>		<category>graph</category>		<category>database</category>
	</item>	<item>
		<title>By: DU</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2606886</link>	
		<description>The basic idea (&quot;relationships as first class citizens&quot;) seems like a good one.  Have to try it out.  Biggest problem I&apos;m expecting is that &quot;rigid tables and columns&quot; makes for easy programming.  Graph traversal is a little less linear and in any case when your data and relationships change, your code needs to change anyway too, presumably.

(And why the FPP link to talk about the thing rather than &lt;a href=&quot;http://neo4j.org/&quot;&gt;the thing itself&lt;/a&gt;?)</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2606886</guid>
		<pubDate>Mon, 15 Jun 2009 09:02:20 -0800</pubDate>
		<dc:creator>DU</dc:creator>
	</item>	<item>
		<title>By: dabitch</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2606887</link>	
		<description>&lt;small&gt;yes that was silly of me, meant to add that but messed up, thanks for the link to the thing itself DU&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2606887</guid>
		<pubDate>Mon, 15 Jun 2009 09:03:49 -0800</pubDate>
		<dc:creator>dabitch</dc:creator>
	</item>	<item>
		<title>By: tksh</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607063</link>	
		<description>neo4j is neat, its API feels a bit rough but it&apos;s very easy to use and to get going if you have a bit of a graph theory background.

Where it makes sense, the graphing model is actually easier to use than the standard tables and columns approach.  Especially in Java, there&apos;s a lot of framework and scaffolding just for the mechanism of using SQL or calling stored procedures.

At least with neo4j&apos;s API, it&apos;s much more straightforward: you create a new node, set whatever attributes you want on it and create (typed) relationships with other nodes you have.  That&apos;s it!  No persistence layer or data access objects to go through or maintain.   And because it&apos;s a graph model, related nodes are just an edge traversal away (another one method call).

This said, it&apos;s more of a niche for where the data naturally fits a graph model, compared with the general fit that relational modeling has become.  There&apos;s also a lot more experience with with RDBMS in general so I&apos;m cautious on that idea that it will become big.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607063</guid>
		<pubDate>Mon, 15 Jun 2009 10:28:56 -0800</pubDate>
		<dc:creator>tksh</dc:creator>
	</item>	<item>
		<title>By: outlier</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607065</link>	
		<description>I was introduced to this idea a few years back, in the context of a few biology researchers thinking about bio-data, how it was deeply inter-related and (in most circumstances) scattered across various and disparate databases. It took me a while to grasp their direction, but the idea was to port or wrap databases into a graph layer which meant that users and AI and data-mining algorithms could explore the mass of inter-connected data with ease.

It was a galvanizing thought. As DU said, the biggest barrier is existing habit. Everyone is used to tables and columns and SQL. There&apos;s a lot of experience and expertise tied up in that, regardless of how unnatural expressing some ideas in that form may be.  But if object databases are finally getting a foothold, maybe graph databases will.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607065</guid>
		<pubDate>Mon, 15 Jun 2009 10:30:42 -0800</pubDate>
		<dc:creator>outlier</dc:creator>
	</item>	<item>
		<title>By: rokusan</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607095</link>	
		<description>&lt;i&gt;First there was the Internet, then the Web, and now the Graph - which Sir Tim labeled (somewhat tongue in cheek) the Giant Global Graph!&lt;/i&gt;

Okay, because &quot;WWW&quot; wasn&apos;t annoying enough to say, now we can say &quot;GGG?&quot;

Just what we need, a planet of nerds who all sound like &lt;a href=&quot;http://www.youtube.com/watch?v=Tfi8fT9oHkQ&quot;&gt;Quagmire&lt;/a&gt;.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607095</guid>
		<pubDate>Mon, 15 Jun 2009 10:46:29 -0800</pubDate>
		<dc:creator>rokusan</dc:creator>
	</item>	<item>
		<title>By: Nelson</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607222</link>	
		<description>Wow, neat! Are there any good papers / web pages / blog posts where people talk about using Neo4j on a real problem, with hundreds of megabytes of data, and talks about how it works out? Ie: build a Neo4j database of the relationships of metafilter posters. I&apos;d love to know how it works. Slide 30 of &lt;a href=&quot;http://www.slideshare.net/emileifrem/neo4j-presentation-at-qcon-sf-2008-presentation&quot;&gt;this presentation&lt;/a&gt; claims several billion nodes on a single JVM, but I&apos;d like to see more detail.

The other thing that presentation claims (slide 33) is &quot;No O/R impedance mismatch&quot;. Which would be awesome, because mapping relational data into objects in Java or Python or Ruby is fundamentally naughty and awful. I&apos;ve often wondered why people haven&apos;t picked up object database research again and made something that worked for journeyman hackers as well as MySQL does.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607222</guid>
		<pubDate>Mon, 15 Jun 2009 11:50:21 -0800</pubDate>
		<dc:creator>Nelson</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607223</link>	
		<description>&lt;a href=&quot;http://en.wikipedia.org/wiki/Prolog&quot;&gt;Everything Old is New Again&lt;/a&gt;

Obviously these Ideas have been around for a long time, probably (I would guess) longer then relational databases. 

The real problem though is: How do you query?  Or rather How do you query in linear time? When you create an &lt;a href=&quot;http://en.wikipedia.org/wiki/Ontology_(information_science)&quot;&gt;Ontology&lt;/a&gt; (which is what those are actually called.  It would be reassuring if the authors of the software actually knew what they were creating) you&apos;re essentially storing a series of logical predicates.  An edge relationship in a graph is equivalent to a logical statement like A x B where A and B are objects and x is a relationship.

But a lot of the questions you want to ask about those kinds of graphs are NP-Complete. 

The title of this post is &quot;Neo4j traverses depths of 1,000 levels and beyond at millisecond speed&quot; but it doesn&apos;t tell you the breadth of the search. For example, let&apos;s say it&apos;s only traveling along one edge at a time.  That means we get 1&lt;sup&gt;1000&lt;/sup&gt; operations.  For an example, what if we asked a question like &quot;how many steps backward do we have to go to find out when the mother of the of the mother of ... (and so on) of sue and the mother of the mother of ... (and so on) Jane are the same person?&quot; We only need to follow the &quot;mother&quot; link.

But on the other hand, what if we want to know &quot;how many steps back do you need to go before Sue and Jane share any common ancestor&quot;.  In that case, at each step there are two branches.  That means rather then 1&lt;sup&gt;1000&lt;/sup&gt; you get 2&lt;sup&gt;1000&lt;/sup&gt; steps. 2&lt;sup&gt;1000&lt;/sup&gt; 10&lt;sup&gt;301&lt;/sup&gt; or 1 followed by 301 zeros.

The thing about a relational database is that it makes the sensible queries easy to do. The challenge has never been about storing Ontologies, but rather doing anything with them once you have the data.

--

And of course you can store a graph in a regular relational database as well.  All you need is one table for the nodes and one table for the edges.  Or you could even just have a single table with A, x, and B but that wouldn&apos;t be normalized.  It would take literally 10 minutes to implement the &lt;i&gt;storage&lt;/i&gt; system for this in a normal relational database system like MySQL, writing a few APIs might take an hour or so.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607223</guid>
		<pubDate>Mon, 15 Jun 2009 11:50:23 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: pwnguin</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607225</link>	
		<description>&lt;a href=&quot;http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607063&quot;&gt;tksh&lt;/a&gt;: &quot;&lt;i&gt;This said, it&apos;s more of a niche for where the data naturally fits a graph model, compared with the general fit that relational modeling has become.&lt;/i&gt;&quot;

There are plenty of places where relational databases aren&apos;t a good fit compared to a graph model.  Take threaded comment systems for example.  The standard way of forcing a tree into a row system is to add a parent field into every row. So every comment is given a parent, or null if it&apos;s a top level comment.  

I have no idea how you quickly rebuild that query into a tree.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607225</guid>
		<pubDate>Mon, 15 Jun 2009 11:52:13 -0800</pubDate>
		<dc:creator>pwnguin</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607232</link>	
		<description>(oops that should say 2&lt;sup&gt;1000&lt;/sup&gt; &lt;b&gt;is&lt;/b&gt; 10&lt;sup&gt;301&lt;/sup&gt; or 1 followed by 301 zeros)</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607232</guid>
		<pubDate>Mon, 15 Jun 2009 11:54:59 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607236</link>	
		<description>&lt;i&gt;I have no idea how you quickly rebuild that query into a tree.&lt;/i&gt;

It&apos;s not that hard. If the data is ordered so that children always come after parents, then all you have to do is build a map in memory as you read in the child nodes.  Each time you read a child node, you look up the parent in your map and attach it. 

If the data isn&apos;t ordered that way, you have to create placeholders for nodes haven&apos;t been read in yet.  But it still isn&apos;t very difficult.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607236</guid>
		<pubDate>Mon, 15 Jun 2009 11:57:26 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607241</link>	
		<description>And as I said, every graph can be represented as a relational database, simply by having a &quot;nodes&quot; and &quot;edges&quot; table.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607241</guid>
		<pubDate>Mon, 15 Jun 2009 11:58:27 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: bottlebrushtree</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607251</link>	
		<description>It has a AGPL 3.0 / Commercial license, so be prepared to open up all your code or spend some money.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607251</guid>
		<pubDate>Mon, 15 Jun 2009 12:02:20 -0800</pubDate>
		<dc:creator>bottlebrushtree</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607267</link>	
		<description>And of course every relational database can be stored as a graph. Each cell in the the database table has a relation to a table, a row, and a column.  

So for example a users table with &quot;ID, First name, last name, username, date of birth, signup date&quot;

could be represented by the following predicates:
&lt;blockquote&gt;ID &lt;i&gt;is a&lt;/i&gt; [User]
ID &lt;i&gt;firstname is&lt;/i&gt; &quot;Mary&quot;
ID &lt;i&gt;lastname is&lt;/i&gt; &quot;Paker&quot;
ID &lt;i&gt;username is&lt;/i&gt; &quot;Mparker&quot;
ID &lt;i&gt;was born&lt;/i&gt; &quot;5/31/1988&quot;
ID &lt;i&gt;signed up&lt;/i&gt; &quot;4/17/2004&quot;&lt;/blockquote&gt;Graphs and relational databases are &lt;i&gt;isomorphic&lt;/i&gt; any graph can be represented as a relational database, and any relational database can be represented as a graph. So a graph storage engine doesn&apos;t actually do anything for you that an RDBM doesn&apos;t, although in theory it could speed up some algorithms by a &lt;i&gt;constant factor&lt;/i&gt; it can&apos;t make the order of growth of the any algorithms faster</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607267</guid>
		<pubDate>Mon, 15 Jun 2009 12:12:44 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: peterneubauer</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607294</link>	
		<description>No one is saying that graph databases are the answer to ALL the data problems. As deimol mentions, graphs and tables are two forms of normalization of information.
Graph databases are providing persisting and operation means for the former, as there are trends that are easier implemented with graphs:
- the Giant Global Graph (Berners Lee), LinkedData, RDF and Triples etc.
- Social graphs and operations like shortest paths
- GIS and geodetic models and again shortest path algos
- Semantic trading
- you average domain model

Of course, RDBMS have certainly their place and above all have heavy tooling and experience drawn from 35 years of practice. All honor to that, but for the bleeding edge and increasingly the normal mashup applications out there, the world of data is getting more and more complex, ad hoc, semi-structured and clustered. That is where new kinds of databses - graph databases, key-value stores, DHTs etc etc are starting to kick in.

see http://highscalability.com/neo4j-graph-database-kicks-buttox#comment-5615 for a good discussion of relational vs graph models.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607294</guid>
		<pubDate>Mon, 15 Jun 2009 12:34:48 -0800</pubDate>
		<dc:creator>peterneubauer</dc:creator>
	</item>	<item>
		<title>By: jouke</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607306</link>	
		<description>Oracle has &lt;a href=&quot;http://www.itk.ilstu.edu/docs/oracle/server.101/b10759/queries003.htm#i2053935&quot;&gt;hierarchical queries&lt;/a&gt;. If only for the classical organisational hierarchy data. I have no idea how that query fares performance-wise compared to a native hierarchical storage.

Hierarchical storage has another modern manifestation in the XML storage, f.i. using Exist, in the so-called XRX architecture (XForms, REST, XQuery). 
Never having worked on an XRX system I do wonder what happens to the relations that conceptually exist between data referenced in different XML documents in storage. There&apos;s a reason that historical hierarchical databases of yore have been supplanted by the relational databases.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607306</guid>
		<pubDate>Mon, 15 Jun 2009 12:43:48 -0800</pubDate>
		<dc:creator>jouke</dc:creator>
	</item>	<item>
		<title>By: Artw</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607308</link>	
		<description>&lt;i&gt;The real problem though is: How do you query?&lt;/i&gt;

Wolfram Alpha!</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607308</guid>
		<pubDate>Mon, 15 Jun 2009 12:45:43 -0800</pubDate>
		<dc:creator>Artw</dc:creator>
	</item>	<item>
		<title>By: TMcGregor</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607312</link>	
		<description>It is impressive to see how things have evolved in this domain. In 1969 the creation of the Internet linked computers together. Then the Web in 1989 linked collections of webpages. More recently, online social initiatives have linked people together with enourmous amount of related data. It&apos;s too early to tell if this &apos;Graph&apos; will really be the next step, but it is indeed interesting and in the right direction.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607312</guid>
		<pubDate>Mon, 15 Jun 2009 12:47:25 -0800</pubDate>
		<dc:creator>TMcGregor</dc:creator>
	</item>	<item>
		<title>By: peterneubauer</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607313</link>	
		<description>Disclaimer: I am part of the Neo4j project.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607313</guid>
		<pubDate>Mon, 15 Jun 2009 12:48:00 -0800</pubDate>
		<dc:creator>peterneubauer</dc:creator>
	</item>	<item>
		<title>By: tksh</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607361</link>	
		<description>&lt;em&gt;There are plenty of places where relational databases aren&apos;t a good fit compared to a graph model.&lt;/em&gt;

I&apos;m not convinced overall that there are &lt;em&gt;that&lt;/em&gt; many situations where a graph model fits better than a relational model.  As delmoi pointed out above, the representations are isomorphic so it&apos;s a matter of the characteristics of your model and your design experience that will choose how you store your data.  And right now, there is a lot more relational theory experience than graph theory experience or even knowledge.

But yes, when it fits, I find thinking in terms of nodes and edges much more intuitive than trying to construe my mental model to fit tables and columns.

&lt;small&gt;peterneubauer: since we have your attention, can I ask about the neo4j API design?  Why is node creation defined only on NeoService?  It makes more sense to have the create and delete calls all on Node, e.g. &lt;code&gt;Node.createNode( RelatationType )&lt;/code&gt; that returns Node&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607361</guid>
		<pubDate>Mon, 15 Jun 2009 13:27:43 -0800</pubDate>
		<dc:creator>tksh</dc:creator>
	</item>	<item>
		<title>By: boo_radley</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607371</link>	
		<description>&lt;a href=&quot;http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607225&quot;&gt;pwnguin&lt;/a&gt;: &quot;&lt;i&gt;I have no idea how you quickly rebuild that query into a tree.&lt;/i&gt;&quot;

Jouke mentioned Hierarchical Queries in Oracle. I know that SQL Server has them as well as of 2008 (perhaps 2005 as well), and there were various procs to do the same thing in earlier versions -- I think the T-SQL cookbook has a good example.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607371</guid>
		<pubDate>Mon, 15 Jun 2009 13:34:23 -0800</pubDate>
		<dc:creator>boo_radley</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607420</link>	
		<description>&lt;i&gt;Graph databases are providing persisting and operation means for the former, as there are trends that are easier implemented with graphs:&lt;/i&gt;

Eh.  The representation is the easy part.  If you have trouble dealing representing your data, you&apos;re probably not that bright. The real question of representation comes in when you&apos;re dealing with algorithms, some algorithms will have certain data access pasterns and if you can optimize for those patterns you can speed things up a lot (but never more then the theoretical bounds of the algorithm). 

Find some examples where Neo4j is faster then a traditional database design, and it might be interesting. 

Right now this sound like a half-baked project from someone who never learned much &lt;i&gt;real&lt;/i&gt; CS.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607420</guid>
		<pubDate>Mon, 15 Jun 2009 14:00:02 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: weston</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607693</link>	
		<description>&lt;i&gt;How do you query?&lt;/i&gt;

Prolog. At least, I wish.

&lt;i&gt;Or rather How do you query in linear time?&lt;/i&gt;

Oh. That.

&lt;i&gt;When you create an Ontology (which is what those are actually called. It would be reassuring if the authors of the software actually knew what they were creating) you&apos;re essentially storing a series of logical predicates.&lt;/i&gt;

Yes. This is one reason that I sometimes wish I had Prolog instead of SQL.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607693</guid>
		<pubDate>Mon, 15 Jun 2009 16:49:44 -0800</pubDate>
		<dc:creator>weston</dc:creator>
	</item>	<item>
		<title>By: anomie</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607827</link>	
		<description>&lt;em&gt;Find some examples where Neo4j is faster then a traditional database design, and it might be interesting. &lt;/em&gt;

I don&apos;t think the creator is claiming any major CS breakthroughs here - only to alleviate some traditional pain-points many folks working with real-world data on RDBMSes encounter &lt;em&gt;all the time&lt;/em&gt;. Joins perform poorly, and semi-structured data in general is a nightmare to work with.

This is why we see so many DHT and other key-value stores cropping up all over the place. The RDBMs can handle these problems &lt;em&gt;in theory&lt;/em&gt;, but in practice they are difficult to scale and easy to get wrong at the worst times.

Something like Neo4j seems like it could strike a nice balance for relational, semi-structured data. Many folks are perfectly happy giving up the schema, but relational integrity is very important to maintain. DHTs scale well, but they make joins even &lt;em&gt;more&lt;/em&gt; expensive.

&lt;em&gt;Right now this sound like a half-baked project from someone who never learned much &lt;b&gt;real&lt;/b&gt; CS.&lt;/em&gt;

This is the exact sort of bile that turns many people off of academia.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607827</guid>
		<pubDate>Mon, 15 Jun 2009 18:31:12 -0800</pubDate>
		<dc:creator>anomie</dc:creator>
	</item>	<item>
		<title>By: emileifrem</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2607867</link>	
		<description>Hi,

So I&apos;m part of the Neo4j crew and registered just now for this thread. It&apos;s running late but hopefully I can answer your questions. If not, I&apos;ll be back tomorrow as well and make a second pass.

@Nelson: &lt;em&gt;Are there any good papers / web pages / blog posts where people talk about using Neo4j on a real problem, with hundreds of megabytes of data, and talks about how it works out?&lt;/em&gt;

Well, there&apos;s a limited &lt;a href=&quot;http://neotechnology.com/customers&quot;&gt;customer list&lt;/a&gt; over on our corporate site, but unfortunately it doesn&apos;t contain any real architectural descriptions. :( Many of our customers deal with several orders of magnitude more than what you&apos;re mentioning though so I think you&apos;ll be safe. :)

@delmoi: &lt;em&gt;The real problem though is: How do you query? Or rather How do you query in linear time?&lt;/em&gt;

Maybe you can look at the query code examples outlined for example &lt;a href=&quot;http://dist.neo4j.org/basic-neo4j-code-examples-2008-05-08.pdf&quot;&gt;here&lt;/a&gt;. Does that answer your question about how you query a graph db? We also support declarative pattern-matching queries through SPARQL.

As a side note, queries in &quot;linear time&quot; as you speak about are only usable with trivial data sets. (A query in O(n) is the equivalent of an RDBMS table scan or traversing a linked list, i.e. unusable with real data.) What we want is sub-linear times, like indexed lookups which typically run in O(log n) or better. You seem very interested in theory and throwing around CS lingo so maybe that&apos;s interesting for you to know.

@delmoi: &lt;em&gt;When you create an Ontology (which is what those are actually called. It would be reassuring if the authors of the software actually knew what they were creating)&lt;/em&gt;

Actually, that&apos;s not an ontology. You&apos;re confusing meta level and instance level. An ontology is a specification of a conceptualization of a domain, i.e. a meta description of the domain&apos;s entities and how they are related. You *can* store that meta layer in a graph db. But in a large majority of the cases what you end up storing in a graph database is the instances.

(Btw, OWL is a language for expressing ontologies, and we support creating a meta model in Neo4j from an OWL file via the &lt;a href=&quot;http://components.neo4j.org/owl2neo/&quot;&gt;owl2neo&lt;/a&gt; component. It&apos;s unfortunately currently very poorly documented.)

@delmoi: &lt;em&gt;&quot;Neo4j traverses depths of 1,000 levels and beyond at millisecond speed&quot; but it doesn&apos;t tell you the breadth of the search&lt;/em&gt;

So with *warm caches*, it typically takes 1 microsecond to go from node A to node B via relationship R. I.e. we can traverse 1000 &quot;links&quot; in around 1ms. Obviously exact details are hardware dependent.

@delmoi: &lt;em&gt;And of course you can store a graph in a regular relational database as well.&lt;/em&gt;

Yes. Still doesn&apos;t mean they can replace graph databases for these use cases. Why?

From a programming model perspective: please see the comments section of the linked article, which discusses this &lt;a href=&quot;http://highscalability.com/neo4j-graph-database-kicks-buttox#comment-5615&quot;&gt;at length&lt;/a&gt;.

From a performance perspective: see for example &lt;a href=&quot;http://news.ycombinator.com/item?id=548867&quot;&gt;this comment&lt;/a&gt; from a crew who tried to implement their graph-heavy system with an RDBMS but ended up having to write their own graph db. I wish you were right, but you&apos;re not. You just can&apos;t get good enough performance on graph traversals in a RDBMS for a lot of use cases. That&apos;s why LinkedIn, Facebook, Google, etc have all decided to NOT use relational dbs *for their graph stuff* and implement their own systems from scratch.

@tksh: &lt;em&gt;Can I ask about the neo4j API design?&lt;/em&gt;

For API discussions, please join the mailing list at https://lists.neo4j.org. It&apos;s very easy to get our attention!

@delmoi: &lt;em&gt;Eh. The representation is the easy part. If you have trouble dealing representing your data, you&apos;re probably not that bright.&lt;/em&gt;

Choice of representation is an incredibly important part of creating a usable and maintainable system. Here&apos;s a good &lt;a href=&quot;http://www.fluidinfo.com/terry/2008/02/13/the-power-of-representation-adding-powers-of-two/&quot;&gt;post&lt;/a&gt; that elaborates on this a bit.

@delmoi: &lt;em&gt;Find some examples where Neo4j is faster then a traditional database design, and it might be interesting.&lt;/em&gt;

Well, we haven&apos;t published any benchmarks at this point. Mainly because we all know that it&apos;s so easy to prove anything with benchmarks and we encourage our customers to actually benchmark their specific use cases. But people keep asking for it so maybe we should just throw one together soon.

@delmoi: &lt;em&gt;Right now this sound like a half-baked project from someone who never learned much real CS.&lt;/em&gt;

Haha, oh come on. Our team has had more than our fair share of CS, on all levels (BS to PhD). Theory is fun and our CS studies taught us like 10% of what&apos;s required to build a real, transactional, enterprise strength graph database. Which is a good start. The other 90%, we picked up working hands on with these issues for the last 9 years.

But rather than trying to convince you it&apos;s not a &quot;half-baked projet,&quot; why don&apos;t you &lt;a href=&quot;http://wiki.neo4j.org&quot;&gt;browse around&lt;/a&gt; and, even better, actually &lt;a href=&quot;http://neo4j.org/download&quot;&gt;download&lt;/a&gt; and check it out? Would love some feedback on the actual product.

(Man, that was a long comment!)

-EE</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2607867</guid>
		<pubDate>Mon, 15 Jun 2009 19:02:58 -0800</pubDate>
		<dc:creator>emileifrem</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2608276</link>	
		<description>&lt;i&gt;This is the exact sort of bile that turns many people off of academia.&lt;/i&gt;

Well the solution to the problem isn&apos;t to reinvent wheels that were studied and given up on decades ago.  Sometimes a solution that wasn&apos;t practical years ago might become practical later on due to changes in hardware, but to be totally ignorant of what&apos;s come before and what was learned isn&apos;t exactly very inspiring. I mean, the whole purpose of &apos;academia&apos; is to learn from what people have done in the past. 

There&apos;s nothing wrong with doing new things, but doing old things and calling it new is annoying.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2608276</guid>
		<pubDate>Tue, 16 Jun 2009 02:57:03 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2608294</link>	
		<description>&lt;blockquote&gt;&lt;blockquote&gt;@delmoi: The real problem though is: How do you query? Or rather How do you query in linear time?&lt;/blockquote&gt;&lt;i&gt;Maybe you can look at the query code examples outlined for example here. Does that answer your question about how you query a graph db? We also support declarative pattern-matching queries through SPARQL.&lt;/i&gt;&lt;/blockquote&gt;The question was &quot;how do you query in linear time&quot; so the answer can&apos;t be &quot;here&apos;s how you query in exponential time&quot;, which is how long a breadth first search actually takes, which is what you used in your example. (Technically, a BFS is linear in the total number of nodes and edges in the graph, but exponential in terms of the depth of the search and polynomial in terms of the connectedness of the vertexes in the graph) 

I wasn&apos;t asking how the query syntax worked, I was asking what the point was in doing queries that would take so long or have such high orders of growth, or if you had an example of some queries that you could do using your system 

&lt;blockquote&gt;&lt;i&gt;As a side note, queries in &quot;linear time&quot; as you speak about are only usable with trivial data sets. (A query in O(n) is the equivalent of an RDBMS table scan or traversing a linked list, i.e. unusable with real data.) What we want is sub-linear times, like indexed lookups which typically run in O(log n) or better. You seem very interested in theory and throwing around CS lingo so maybe that&apos;s interesting for you to know.&lt;/i&gt;&lt;/blockquote&gt;

Heh, I actually meant constant time or near constant time (such as logarithmic time). I&apos;ve actually implemented databases that would return results in O(log n) time.  I don&apos;t mean implemented as in used off the shelf databases and setting up the indexes, but rather writing them from scratch using the red-black tree algorithm in memory mapped files. The reason I did that wasn&apos;t because I was having trouble mapping my data to a relational database, but rather because I wanted something that was optimized for exactly what I was doing and would be guaranteed to have smooth performance - never hitting any kind of wall or getting stopped up every once in a while doing something like resizing a hash table.  

The problem is, if you tried to run a query like that on data from facebook, it would basically never return unless you set a maximum depth (or unless you happened to luck out and that person was nearby).  And if you did set a maximum depth the query would still take forever depending on the connectedness of the graph. 

Being able to traverse 1,000 edges in a millisecond is nice, but again, why would you want too? Give me an example of an algorithm or use case where this works better then a RDBMs.  

Having implemented non RDBMs storage systems, I know they are out there, but the one I created was very application specific.  What are the &lt;i&gt;general&lt;/i&gt; use cases for something like this that have a reasonable time complexity with large datasets? Graph reachability isn&apos;t a good example because the time complexity is huge.

&lt;blockquote&gt;&lt;i&gt;Actually, that&apos;s not an ontology. You&apos;re confusing meta level and instance level.&lt;/i&gt;&lt;/blockquote&gt;

No, I&apos;m not. You&apos;re the one who&apos;s confused here.  You could try actually reading &lt;a href=&quot;http://en.wikipedia.org/wiki/Ontology_(computer_science)#Ontology_components &quot;&gt; the wikipedia article I linked too&lt;/a&gt; which lists &quot;Individuals: instances or objects (the basic or &quot;ground level&quot; objects)&quot; as the one of the things that are commonly included in an Ontology (and no, I didn&apos;t just change it, you can check the page history :)</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2608294</guid>
		<pubDate>Tue, 16 Jun 2009 03:40:41 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: emileifrem</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2608424</link>	
		<description>@delmoi: &lt;em&gt;I wasn&apos;t asking how the query syntax worked, I was asking what the point was in doing queries that would take so long or have such high orders of growth, or if you had an example of some queries that you could do using your system&lt;/em&gt;

Ah, ok. So how about this scenario: Let&apos;s say we have 1M people, and every person has an avg of 50 friends. Then we want to grab two people randomly and figure out whether they&apos;re connected through friends-of-friends. Everyone&apos;s supposedly connected by degree six so let&apos;s limit the depth to four.

On various randomized graphs with 1M people and 25M friendship relationships we execute that query in an avg of ~2ms with warm caches. That particular pathExists() implementation visits an upper bound of 2*(avg rel)^d/2 where d is depth, so in this case 5000 rels. If our rules of the thumb approximation of 1000 rels/ms holds right the query should always terminate in 5 ms or less. And that&apos;s been the upper bound in practice as well.

Neo4j will perform equally well (~2ms) on a graph of 10M people and 250M friendship relationships. And if you can squeeze anywhere near that from a relational database at that scale and beyond, I guarantee you have a job at any large social networking site.

(This is an example from my QCon SF 2008 presentation (on &lt;a href=&quot;http://www.slideshare.net/emileifrem/neo4j-presentation-at-qcon-sf-2008-presentation&quot;&gt;slideshare&lt;/a&gt;) but unfortunately I see now that the performance numbers were lost in the Slideshare-ification.)

Incidentally, this is not a theoretical use case. It was given to us as part of a tech evaluation with a large social networking site as a representative example of their usage patterns.

@delmoi: &lt;em&gt;[wikipedia] lists &quot;Individuals: instances or objects (the basic or &quot;ground level&quot; objects)&quot; as the one of the things that are commonly included in an Ontology&lt;/em&gt;

Ah, right, my bad. Ontology is a very overloaded word and I&apos;ve found that in practice there are two types of ontologies (in CS, not even touching philsophy here obviously): small-scale knowledge-representation ontologies (like for instance the stereotypical wine ontology), where they sometimes add instances as well as types (for example &lt;a href=&quot;http://iswc2004.semanticweb.org/demos/32/WineRDFS.PNG&quot;&gt;&quot;Elyse Zinfandel&quot;&lt;/a&gt; [an instance] intermixed with Zinfandel and Winery [types]). We never work with those as we&apos;re not in the knowledge representation business (some of our partners are, and we bring them in for helping our customers with that part if they need it).

The other type is the strict schema ontology, which is what we use sometimes for representing the domain of a system. See an example &lt;a href=&quot;http://www.cybergeo.eu/docannexe/image/8322/img-1.jpg&quot;&gt;here&lt;/a&gt;.

I tend to call the first &quot;knowledge bases&quot; and the second ontologies as per the famous Noy / McGuinness introductory paper (now at &lt;a href=&quot;http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html&quot;&gt;Protege&lt;/a&gt;): &lt;em&gt;&quot;An ontology together with a set of individual instances of classes constitutes a knowledge base.&quot;&lt;/em&gt; That&apos;s their definition and they also go on to say &lt;em&gt;&quot;In reality, there is a fine line where the ontology ends and the knowledge base begins.&quot;&lt;/em&gt; That&apos;s the one that has stuck with me.

Back to your original point (&quot;it&apos;s an ontology so it&apos;d be reassuring if they called it that&quot;), I&apos;ve actually found that if I want to scare fellow programmers away from graph databases, a good way is to start speaking about ontologies and knowledge bases. Add some description logic and reification theory to that and we got ourselves a killer image but one that would drive most people away. The semweb crowd likes it so I use it with them. But since my goal is adoption by real-world, industrial applications, I usually stick with the simpler concepts of graph databases. YMMV.

-EE</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2608424</guid>
		<pubDate>Tue, 16 Jun 2009 06:00:11 -0800</pubDate>
		<dc:creator>emileifrem</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2608453</link>	
		<description>&lt;i&gt;On various randomized graphs with 1M people and 25M friendship relationships we execute that query in an avg of ~2ms with warm caches. That particular pathExists() implementation visits an upper bound of 2*(avg rel)^d/2 where d is depth, so in this case 5000 rels.&lt;/i&gt;

Sure, but if you wanted a depth of 5 rather then a depth of 4, you would be talking about 250,000 visits or a quarter of a second. And for a depth of 7, you&apos;re looking at 12.5 seconds and so on. 

You can also do that search with a single SQL query.  Lets say you have a users table with people, and a friends table that has a many to many relationship between them (just like the graph example above)

If you want to query out connectedness in 4 steps you can do (something like): 

Select user1.id, user2.id from
&amp;nbsp;&amp;nbsp;&amp;nbsp;    users as user1 
&amp;nbsp;&amp;nbsp;&amp;nbsp;    inner join friends on user1.id = friends.in
&amp;nbsp;&amp;nbsp;&amp;nbsp;    inner join users as inneruser1 on inneruser1.id = friends.out, 

&amp;nbsp;&amp;nbsp;&amp;nbsp;    users as user2
&amp;nbsp;&amp;nbsp;&amp;nbsp;    inner join friends2 on user2.id = friends2.in
&amp;nbsp;&amp;nbsp;&amp;nbsp;    inner join users as inneruser2 on inneruser2.id = friends.out
   
&amp;nbsp;&amp;nbsp;&amp;nbsp;    where inneruser2.id = inneruser1.id and id = @target

I think (but I havn&apos;t tried it, obviously) that query should get a list of everyone who is steps away from @target. You might need to tweak it. 

Another option would be to get a list of everyone who is two steps away from user1 and 2 and then check for equality in memory.  That would require sending a lot more data from the database, so if the DB is on another machine, it would waste bandwidth, but I&apos;m not even sure your solution even works over the network (does it?).

The problem of course is that as you add more and more levels, you need to do more and more joins.  But that isn&apos;t a real problem because those queries are totally intractable anyway.  I guess if you had a really sparse graph (less then 2 connections on average) it might be helpful.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2608453</guid>
		<pubDate>Tue, 16 Jun 2009 06:23:45 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2608456</link>	
		<description>&lt;i&gt;that query should get a list of everyone who is steps away from @target.&lt;/i&gt;

Er, I mean &lt;i&gt;four&lt;/i&gt; steps away, obviously.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2608456</guid>
		<pubDate>Tue, 16 Jun 2009 06:25:35 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: tksh</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2608485</link>	
		<description>&lt;small&gt;emileifrem: Hi.  Sorry to interject but can you check your mefi mail?  Upper right corner, tiny envelope icon.&lt;/small&gt;</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2608485</guid>
		<pubDate>Tue, 16 Jun 2009 06:47:18 -0800</pubDate>
		<dc:creator>tksh</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2608526</link>	
		<description>Hmm, actually I just realized my example above would only find people up to 2 steps apart, not 4.  You would need one more level of joins to get to 4 steps. 

But the point is if you have a fixed depth, you can do it with a single SQL query.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2608526</guid>
		<pubDate>Tue, 16 Jun 2009 07:18:57 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: Nelson</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2608641</link>	
		<description>Thanks for joining us, Emile! Appreciate you coming in and giving more pointers to stuff.  Those performance numbers you mention (&quot;randomized graphs with 1M people and 25M friendship relationships&quot;) are very interesting. If you have the time and permission to turn that tech evaluation you did into a public white paper, I think a lot of people would learn from it.

delmoi: &lt;i&gt;The problem of course is that as you add more and more levels, you need to do more and more joins. But that isn&apos;t a real problem because those queries are totally intractable anyway.&lt;/i&gt;

I think the promise of Neo4j is that in their representation, those queries are not intractable. Sure you can model a graph in a relational database, but as you contemplate inner joining your giant Friendships table with itself 4 times to work on the set of people 4 links away from someone, you realize it&apos;s going to not work very well. That&apos;s why every social network site I know (the big ones) either avoid doing large graph calculations on their network or else build custom datastores.

I&apos;m excited to see that Neo4j is trying to provide a datastore better suited for this task. And respectfully, delmoi, despite your awesome skillz with red-black trees and mmap() I suggest it might take you a few weeks to evaluate whether Neo4j has accomplished something new and useful or not. I don&apos;t have the time or interest to do that evaluation myself, so I don&apos;t know how well it works, but that&apos;s hardly a reason for me to dismiss it out of hand on Metafilter.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2608641</guid>
		<pubDate>Tue, 16 Jun 2009 08:53:16 -0800</pubDate>
		<dc:creator>Nelson</dc:creator>
	</item>	<item>
		<title>By: delmoi</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2609082</link>	
		<description>&lt;i&gt;I think the promise of Neo4j is that in their representation, those queries are not intractable.&lt;/i&gt;

Uh no, the problems are intractable no mater how you represent your data, that&apos;s just a mathematical fact. emileifrem confirmed this when he gave the time complexity of his Breadth First search algorithm, by the way.

&lt;i&gt;but as you contemplate inner joining your giant Friendships table with itself 4 times to work on the set of people 4 links away from someone, you realize it&apos;s going to not work very well.&lt;/i&gt;

Of course not, but the same is true of a Breadth First search in any representation.  The point I was making that the graph database doesn&apos;t solve the &quot;you need lots of queries to do this&quot; because it can actually be done with just one query. If you can eliminate extra seperate queries you can really boost performance, but that&apos;s not happened here.

&lt;i&gt;I suggest it might take you a few weeks to evaluate whether Neo4j has accomplished something new and useful or not.&lt;/i&gt;

I actually have better things to do with my time.  If they&apos;ve got a better use case that shows Something you can&apos;t do in a relational database easily or quickly, then it would be more intresting. 

I&apos;m hardly a huge fanboi for relational databases, but it would be helpful if you want to do something to new to have a clear idea of what drawback you&apos;re correcting, preferably with a real world use case.</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2609082</guid>
		<pubDate>Tue, 16 Jun 2009 14:05:44 -0800</pubDate>
		<dc:creator>delmoi</dc:creator>
	</item>	<item>
		<title>By: emileifrem</title>
		<link>http://www.metafilter.com/82478/Neo4j-traverses-depths-of-1000-levels-and-beyond-at-millisecond-speed#2610787</link>	
		<description>Sorry to drop by so late, but running very low on cycles here. Some quick comments:

@Nelson: Thanks for your kind comments. I agree we should take time to make a white paper out of that PoC. So much to do and so little time. :)

@delmoi: I think the step-by-step evolution of your SQL is converging on one of my two main points =&amp;gt; the difficulty of writing a query like this in SQL. I&apos;d say you&apos;re probably 50 lines short of a query that returns true/false if an arbitrary-length path exists between two persons, parameterized with max depth, taking into account directed relationships. Developing it is a nightmare and maintaining it even worse.

My second point is that the performance will be horrible. With the data set size I quoted (10M persons, 250M follows rels) I think a query with depth 4 will probably execute in hours.

There&apos;s a reason why all the big guys have implemented their own data stores for these operations. And it&apos;s not because they don&apos;t know their RDBMS-fu. They certainly have both the money and the know-how to poor the best the world has to offer in terms of RDBMS design &amp;amp; optimization into their solutions. No, it&apos;s because they&apos;ve pushed and pushed and pushed their relational dbs to the limit and beyond and in pure desperation had to implement their own brand new system from scratch in order to solve their real-world problems.

When we look at those problems, that they can&apos;t solve using an RDBMS, we see that a graph db handles them really well. Not in perpetuity. But up to the limits that make sense for them (they don&apos;t want to check depth 6 because then we might as well just hard code it to constantly return true). And a graph db solves them well enough that you can use an off-the-shelf Neo4j to run them. In production. I&apos;m not sure if it can be clearer or more real world than that? (And we haven&apos;t even discussed all the development-time benefits of a graph db in terms of schema free modeling and little impedance mismatch, etc.)

But if you can write a query as per the above which executes anywhere near Neo4j&apos;s 2ms, actually even if you get it below one minute, I would *love* to see your SQL and E/R model. I&apos;m not saying this rhetorically, I&apos;m very serious: If you think you can do it, please do try it out and let me know (first name @ neotechnology.com) how it worked out.

If Neo4j isn&apos;t at least 1 000 000 times faster then I&apos;ll owe you a beer. :)

-EE</description>
		<guid isPermaLink="false">comment:www.metafilter.com,2009:site.82478-2610787</guid>
		<pubDate>Wed, 17 Jun 2009 18:01:14 -0800</pubDate>
		<dc:creator>emileifrem</dc:creator>
	</item>
	</channel>
</rss>
