Data structure for Social news streams on Graph data bases – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany

UPDATE: look at http://www.rene-pickhardt.de/graphity for a more scientific survey and evaluation of this data structure.
Ok you guys did not hear much from me most recently. I was on vaccation and then on summer school and I worked on my first scientific poster and on a talk which will hopefully ontribute to my PhD thesis. Well at least I can now share some ressources which include my poster and the slides from my talk. But let me first show you two pictures.

social network graph — The standard graph of a social network. You see several people and attached to them content items identified by numbers which are supposed to be time stamps

data structure for a social news stream — My model of a social network graph. The ego network changed from a star topology to a list topology and each ego network has a certain edge type which is modeled by edge color here. This graph stores exactly the same information as the standard model but makes retrieval of news streams much faster

Poster

Feel free to download and look at my first poster with the Title: a model for social news streams and time indices on graph data bases
You will probably not see so many things in it without the slides from my talk. So let me explain some things. I was looking into the data structures to model social news streams. Basically there is the approach of normalized or denormalized relational data bases which I call the twitter approach for the reason that Twitter is doing something similar with FlockDB
I also looked into the case of saving the news stream as a flat file for every user in which the events from his friends are saved for every user. For some reason I thought I had picked up somewhere that facebook is running on such a system. But right now I can’t find the resource anymore. If you can, please tell me! Anyway while studying these different approaches I realized that the flat file approach even though it seems to be primitive makes perfect sense. It scales to infinity and is very fast for reading! Even though I can’t find the resource anymore I will still call this approach the Facebook approach.
I was now wondering how you would store a social news stream in a graph data base like neo4j in a way that you get some nice properties. More specifically I wanted to combine the advantages of both the facebook and the twitter approach and try to get rid of the downfalls. And guess what! To me this seems actually possible on graph data bases. The key Idea is to store the social network and content items created by the users not only in a star topology but also in a list topology ordered by time of occuring events. The crucial part is to maintain this topology which is actually possible in O(1) while Updates occure to the graph.

Talk

As mentioned together with this poster I was giving a talk social news streams and time indices on social network graphs. Please feel free to download the slides. Unfortunally I improved the example while making the poster so that some pictures are not consistant with those from the poster! If I find the time I will not only update the slides but also give the talk as a video lecture on youtube! I think this will be helpful to spread the idea!

Future Work

I need to publish all these results in a good coference or journal
relevance filtering and recommendations which is the problem I am really interested in.
Implementing this stuff for my social network (see blog post)

Open Questions

Is it possible in neo4j to specify edgetypes (Relationship types) on runtime rather than compiletime.
If so: Is the time of accessing them O(1) with respect to the node degree?
If not: is there a graph data base that is capable of doing this?

Discussion

Anyway it is great to see how much more insight you get when thinking of a problem in a scientific way and not only implement it right away!

20 Comments

Social news streams – a possible PhD research topic? says:

September 5, 2011 at 1:44 pm

[…] http://www.rene-pickhardt.de/data-structure-for-social-news-streams-on-graph-data-bases/ […]

Social news streams – a possible PhD research topic? says:

September 5, 2011 at 1:44 pm

[…] http://www.rene-pickhardt.de/data-structure-for-social-news-streams-on-graph-data-bases/ […]

José Devezas says:

September 6, 2011 at 10:48 am

Regarding your question “Is it possible in neo4j to specify edgetypes (Relationship types) on runtime rather than compiletime.”, yes, this is possible in Neo4j, by using DynamicRelationshipType (http://api.neo4j.org/1.4/org/neo4j/graphdb/DynamicRelationshipType.html).
Also, I’m checking your slides and since you refer a “smart graph traversal” to choose the most recent edges I think you might be interested in using Pipes or Gremlin, by Tinkerpop (http://www.tinkerpop.com/). This can use an underlying Neo4j graph DB and makes graph traversals a breeze.
Good luck with the PhD, maybe I’ll see you at a conference sometime. 🙂

José Devezas says:

September 6, 2011 at 10:48 am

Rene says:
September 6, 2011 at 12:52 pm

Reply

Hey José,
thanks for your very helpful (!) hints. I should have known the dynamic relationship type since I already used it (but in a totally different usecase and also in a very static way)
I tried to grasp a quick look into the source code to figure out how fast choosing an edge at a given node is but couldn’t find the right spot yet. But I think now that I know that it works with DynamicrelationsShipTypes figuring this out is just a matter of time and more accurate reading the source code which is an interesting thing to do anyway (-:
As for the “smart graph traversal”: After changing from star topology to list toplogy I don’t need a smart graph traversal any more I will just use “top-k n-way merge” I saw Gremlin before but didn’t use it a lot. It might be fun though to see if I can implement “top-k n-way merge” with the help of gremlin (-:
As I saw from your website you’re also a PhD student. So I wish you good luck too and hopefully we’ll see at some conference some time (:
1. Peter Ehrlich says:
  February 12, 2012 at 10:59 pm
  
  Reply
  
  Hello!
  Thanks for writing this up, it’s very good to think about.
  So this currently structures things as two dimensional list: one dimension traversing people, and the second the branches of their post. This means that to get a single list ordered by time, all branches must be queried and then merged.
  What if one were to design a system slightly differently: Have each branch (of feed items) as it is now, but then rather than keeping a linked list of people for every follower, keeping a linked list of people’s feed items directly.
  Thus when a post is made, it is added both to the top of its own branch, and to the top of of every follower’s feed.
  Cheers
  –Peter
  1. Rene says:
    February 13, 2012 at 6:11 pm
    
    Reply
    
    That would certainly work. The only problem is that this would not respect changes in the friendship graph. So if new friends come or go the list that you suggest would have to be updated.
    Also in your case we would not really need a graph data base but could just use flat files (single index for every user) in order to store all the news updates.

Amazed by neo4j, gwt and my apache tomcat webserver says:

September 15, 2011 at 10:56 am

[…] reading papers I am currently implementing the infrastructure of my social news stream for the new metalcon version. For the very first time I was really using neo4j on a remote […]

Amazed by neo4j, gwt and my apache tomcat webserver says:

September 15, 2011 at 10:56 am

[…] reading papers I am currently implementing the infrastructure of my social news stream for the new metalcon version. For the very first time I was really using neo4j on a remote […]

neo4j based social news feed demo on wikipedia graph running says:

September 21, 2011 at 11:12 pm

[…] finally demonstrate the neo4j and gwt system that I have been blogging about over the last weeks here and here. But please find the demo under the following […]

neo4j based social news feed demo on wikipedia graph running says:

September 21, 2011 at 11:12 pm

[…] finally demonstrate the neo4j and gwt system that I have been blogging about over the last weeks here and here. But please find the demo under the following […]

11 lessons learnt after my first scientific paper was submitted says:

November 2, 2011 at 1:59 am

[…] become clearer when you really dig into it. It is amazing how all the practical runtimes of my graph data base index for social news feeds – let’s call it GRAPHITY – matched the theoretical runtimes. But while evaluating […]

11 lessons learnt after my first scientific paper was submitted says:

November 2, 2011 at 1:59 am

Graphity: An efficient Graph Model for Retrieving the Top-k News Feeds for users in social networks says:

February 5, 2012 at 12:27 pm

[…] Graphity works is explained in my first presentation and my poster which I both already talked about in an older blog post. But you can also watch this presentation to get an idea of it and learn about the evaluation […]

Graphity: An efficient Graph Model for Retrieving the Top-k News Feeds for users in social networks says:

February 5, 2012 at 12:27 pm

Typology Oberseminar talk and Speed up of retrieval by a factor of 1000 says:

August 16, 2012 at 12:55 pm

[…] see from the slides we are planning to submit our results to SIGIR conference. So one year after my first blogpost on graphity which devoloped in a full paper for socialcom2012 (graphity blog post and blog post for source […]

Typology Oberseminar talk and Speed up of retrieval by a factor of 1000 says:

August 16, 2012 at 12:55 pm

Popular Posts

What are the 57 signals google uses to filter search results?

Graphity: An efficient Graph Model for Retrieving the Top-k News Feeds for users in social networks

Algorithmic Information Filter from Eli Pariser’s TED Talks

Time lines and news streams: Neo4j is 377 times faster than MySQL

20 Comments

Leave a Reply Cancel reply

Poster

Talk

Future Work

Open Questions

Discussion

You may also like...

Popular Posts

20 Comments

Leave a Reply Cancel reply