My graph index graphity was – as mentioned in another blogpost – accepted at socialcom 2012. After I explained how it works and sharted the source code I now want to share some information about the history of submissions, reviews, quality of reviews, taken actions and so on. So if you are a coder, hacker, geek or what so ever just skip this post and digg into the source code of graphity. If you are a researcher you might enjoy learning from my experiences.
2011 April: Defining the research question
Wondering about possible research topics many people told me that running a social network like metalcon and seeing so much data would be like treasure for research questions. Even though at that time our goal was just to keep metalcon running in its current state I startet thinking about what research questions could be asked. For most questions we just did not collect good data. In this sense I realized if metalcon should really give rise to research questions I would have to reimplement big parts of it.
Since metalcon also had the problems of slow page loading times I decided to efficiently recode the project. I new that running metalcon on a graph data base like neo4j would be a good idea. That is why I started wondering about how to implement a newsfeed system efficiently in a scalable manner using a graph data base.
==> A research question was formulated.
2011 Mid August: Solution to the problem was
After 3 months of hard thinking and trying out different data base designs I had a meeting with my potential supervisor Prof. Dr. Steffen Staab. He would have loved to see some prelimenary results. While summing up everything for the meeting the solution or better said the idea of Graphity was suddenly clear before my eyes.
In late august I talked about the ideas in the oberseminar and I presented a poster at the european summer school of information retrieval. I wondered about where to submit this.
Steffen thought that the idea was pretty smart and could be published in a top journal like VLDB which he suggested me to choose. He said it would be my first scientific work and VLDB has a very short reviewing cycle of less than one month that could provide me fast feedback. A pieve of advice which I did not understand and also did not follow. Well, sometimes you have to learn the hard way…
Being confident about the quality of the index also due to Steffens positive feedback and plans for VLDB my plan was rather to submit to WWW conference. I thought social networks would be a relevant topic to this conference. Steffen agreed but pointed out that he was a program chair for that conference which would mean that submissions from his own institute would not be the best idea.
Even though graphity was not implemented or evaluated yet and no related work was read we decided that the paper should be submitted to SIGMOD conference. With an upcoming deadline of only 2 months later.
By the way having these results and starting this hard work gave me founding for 3 years starting in october 2012!
2011 november: Submission to sigmod
After two months of very hard work especially together with Jonas Kunze a physicist from metalcon from whom I really learnt a lot about setting up software and evaluation frameworks the paper was submitted to sigmod. Meanwhile I tried to get an overview of the related work (which at that time was kind of the hardest part) since I really did not know how to start and my research question came from a very practical and relevant usecase but did not emerge after reading many papers.
Anyway we finished our submission just in time together with all the experiments and a – I would still say decent – text about graphity. (you can have the final copy of the sigmod publication on request)
2011 middle of November: Blogpost with best content from the paper
I talked back to my now Supervisor Steffen and asked him what he thought about blogging the results of graphity already trying to get some awareness in the community. He was a little skeptical since blogs can not really be cited and the paper was not published yet. I argued that blogs are becoming more and more importent in research. I said even if a blogpost is not publication in the scientific sense it still is a form of publication and gains visability. Steffen agreed to test this out and I have to say the idea to do this was perfect. I put the core results on my blog received very positiv feedback from people working at linkedin, microsoft and other hackers. Also I realized that the problem was currently unpublished / unsolved and relevant to many people.
Interestingly one of my co-authors tried to persue me to publish the sigmod version of the paper as a technical report. I did not get the point of doing so (he said something like, then it is officially published and no one can steal the idea…” Up to today I don’t get the point of doing this other then manipulating ones official publication count…)
2012 February: Reject from Sigmod
After all the strong feedback on my blog and via mail the reviews from SIGMOD conference where quite disappointing. Every single reviewer highly valued the idea and the relevance of the problem. But they criticized the evaluation and the related work.
- For the evaluation they criticized that using a memory disk is manipulating and the distributing and scaling behaviour could not be seen by theoretical and emperical proves of some big O notations.
- As for the related work they where missing some papers especially from the data base community and as I said related work was definately one of the weaknesses of the paper.
- Other feedback was on our notation and naming for some baselines.
The most disappointing of all the feedback was the following quote:
W1: I cannot find any strong and novel technical contribution in their algorithms. Their proposed method is too simple and straightforward. It is simply an application of using doubly linked list to graph structures.
To give an answer to this at least once: Yes the algorithm is simple and only based on double linked lists. BUT it is in the best complexity class one can imagine for the problem it scales perfectly and this was prooven. How can one throw away an answer because it is too simple if it is the best possible answer? I sense that this is one of the biggest differences between a mathematician and computer scientist. In particular the same reviewer pointed out that the problem was relevant and important.
Having this feedback we came to the conclusion that the database community might have been the wrong community for the problem. Again we would have figured this out much faster if submitting to VLDB like Steffen had suggested.
2012 February: Resubmission to hypertext
Only 1 week away from the feedback was the submission for the hypertext conference. Since one week was not really much time we decided to not implement any of the feedback and just check what another community would say. My supervisor Steffen was not really convinced of this idea but we argued that the notification of hypertext conference was only one month later and this quick and aditional feedback might be of help.
2012 late march: Reject from Hypertext
The reviews from hypertext conference have been rather strange. One strong accept but the reviewer said he was not an expert in the field. One strong reject of a person that did not understand the correctness of our top k n way merge (which was not even core to the paper) and a borderline from one reviewer who really had some interesting comments on our terminology and the related work.
Overall the paper was accepted as a poster presentation and we decided that this was not sufficient for our work so we withdraw our submission.
2012 mai resubmission to socialcom
Discussions rose up which conference we should target now. We have been sure that the style of the evaluation was for sure nothing for the data base community. On the other hand graph data bases alone would not be sufficient for semantic conferences and social is such a hype right now that we really were not sure.
Steffen would have liked to submit the paper to ISWC since this is an important community for our institue. I suggested SocialCom since the core of our paper was really lying in social networking. All reviews so far have valued the problem as important and relevant for social networks and people basically liked the idea.
We tried to find related work for the used baselines and figured out that for our strongest baseline we could not find any related work. 3 days before the submission to Socialcom Steffen argued that we should drop the name of the paper and not call it graphity anymore. He said that we just sell the work as two new indices for social news streams (which I thought was kind of reasonable since the baseline to graphity really makes sense to use) and was not presented in any related work. Also we could enhance the paper with some feedback from my blog and the neo4j mailinglist. The only downside of this strategy was that changing the title and changing the story line would yield to a complete rewrite of the paper. Something I was not to keen of doing within 3 days. My coauther and I were convinced that we should stick to our changes from working in the feedback and stick to our argument for the baseline without related work.
We talked back to Steffen. He said he left the final decission to us but would strongly recommend to change the storyline of the paper and drop the name. Even though I was not 100% convinced and my coauthor also did not want to rewrite the paper I decided to follow Steffens experience. We rewrote the story line.
2012 July accept at SocialCom
The paper was accepted as a full paper to SocialCom. So following Steffens advice was a good idea. Also not getting down by bad reviews from the wrong community and staying convinced that the work had some strong results turned out to be a good thing. I am sure that posting preliminary results on the blog was really nice. In this way we could integrate open accessable feedback from the community to our third submission. I am excited how much I will have learnt from this experience with later submissions.
I will publish the paper on the blog as soon as the camera ready version is done! Subscribe to my newsleter, RSS or twitter if you don’t want to miss the final version of the paper!