For my social news stream application I am heavily thinking about the right software to support my backend. After I designed a database model in MySQL I talked back to Jonas and he suggested to search for a better suiting technology. A little bit of research brought me to a Graph database called Neo4j.
After I downloaded the opensource java libs and got it running with eclipse and maven I did some testing with my Metalcon data set. And I have been very satisfied and the whole project looks very promesing to me. I exported 4 relations from my MySQL Database.
- UserUserFriend containing all the friendship requests
- UserProfileVisit containing the profiles a user visited
- UserMessage containing the messages between users
- UserComment containing the profile comments between users
These relations obviously form a graph on my data set. Reading the several 100’000 lines of data and put them into the graph data structure and building a search index on the nodes only took several seconds runtime. But I was even more impressed by the speed with which it was possible to traverse the graph!
Receiving the shortest path between two users of length 4 only took me 150 milliseconds. Doing a full bredthfirst search on a different heavily connected graph with 290’000 edges only took 2.7 seconds which means that neo4j is capable of traversing about 100’000 edges per second.
Now I will have to look more carefully to my usecase. Obviously I want to have edges that are labled with timestamps and retrieve them in orderd lists. Adding key value pairs to the edges and including and index is possible which makes me optimisitic that I will be able to solve a lot of my queries of interest in an efficiant manner.
Unfortunately I am batteling around with Google Webtoolkit and Eclipse and Neo4j which I want to combine for the new metlcon version but I even asked the neo4j mailinglist with an very emberassing question and the guys from neotechnology have been very kind and helpful (even thogh I still couldn’t manage to get it running) I will post an article here as soon as I know how to set everything up.
In General I am still a huge fan of relational databases but for a usecase of social networks I see why graph data bases seem to be the more sophisticated technology. I am pretty sure that I could not have perfomed so well using MySQL.
What is your experience with graph data bases and especially neo4j?