java – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 GWT + database connection in Servlet ContextListener – Auto Complete Video Tutorial Part 5 https://www.rene-pickhardt.de/gwt-database-connection-in-servlet-contextlistener-auto-complete-video-tutorial-part-5/ https://www.rene-pickhardt.de/gwt-database-connection-in-servlet-contextlistener-auto-complete-video-tutorial-part-5/#comments Mon, 24 Jun 2013 11:44:47 +0000 http://www.rene-pickhardt.de/?p=1653 Finally we have all the basics that are needed for building an Autocomplete service and now comes the juicy part. From now on we are looking at how to make it fast and robust. In the current approach we open a new Data base connection for every HTTP request. This needs quite some time to lock the data base (at least when using neo4j in the embedded mode) and then also to run the query without having any opportunities to use the caching strategy of the data base.
In this tutorial I will introduce you to the concept of a ContextListener. This is roughly spoken a way of storing objects in the Java Servlet global memory using key value pairs. Once we understand this the roadmap is very clear. We can store objects like data base connections or search indices in the memory of our web server. As from what I currently understand this could also be used to implement some server side caching. I did not do any benchmarking yet testing how fast retrieving objects from context works in tomcat. Also this method of caching does not scale horizontally well as using memcached.
Anyway have fun learning about the context listener.

If you have any suggestions, comments or thoughts or even know of some solid benchmarks about caching using the ServletContext (I did a quick web search for a view minutes and didn’t find any) feel free to contact me and discuss this!

]]>
https://www.rene-pickhardt.de/gwt-database-connection-in-servlet-contextlistener-auto-complete-video-tutorial-part-5/feed/ 1
Analyzing the final and intermediate results of the iversity MOOC Fellowship online voting https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/ https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/#comments Thu, 23 May 2013 23:07:24 +0000 http://www.rene-pickhardt.de/?p=1609 As writen before Steffen and I participated in the online voting for the MOOC fellowship. Today the competition finished and I would like to say thank you to everyone who so far participated in the voting in particular to the 435 people supporting our course. I did never image to get that many people to be interested in our course!
The voting period went from May first till today. During this period the user interface of the iversity website changed several times providing different kind of information about the voting to us users. Since I have observed a drastic change in rankings on May 9th and since the process and scores have not been very transparent I have decided on that very day to collect some data about the rankings. I already did some quick analysis on the data and found some interesting facts but I am running out of time right now to conduct an extensive data analysis. So I will share the data set with the public domain:
http://rene-pickhardt.de/mooc.tar.bz2 (33MB)
If you download the zip file and extract it you’ll find folders for every hour after May 9th. In every folder you will find 26 html-files representing the current ranking of the courses at that time and a transaction log of the http-requests which were done to download the 26 html files. There are 26 html files since 10 courses were displayed per page and we had 255 courses participating.
During the time of data collection I had 2 or 3 short down times of my web server so it could be possible that some data points are missing.
I already wrote a “dirty hack” and pushed it on github which also extracts the interesting information out of the downloaded html files.

  1. There is a file rank.tsv (334 kb) that displays for every course on an hourly basis the rankings
  2. There is a file vote.tsv (113 kb) that contains for every course on an hourly basis (between may 20th and today) the number of votes the course did acquire. The period of time for vote.tsv is so short since the votes have only been available in the html files during this time. 

Skimming the data with my eyes there are already some facts that make me very curious for a deeper data analysis:

  1. Some courses gained several hundred votes within a short period of time (usually only 2 or 3 hours) whereas most courses (especially those gaining such a large amount of votes) often stayed far under 1000 votes at all. 
  2. Also it is interesting to see how much variation has been going on in the last couple of days. 
  3. Also I haven’t crawled the views of the Youtube videos of the courses and even now after observing the following I did not take a snapshot of them it is interesting that there is such a large difference in conversion rate. Especially the top courses seem to have much more votes than they have views of the application video. Where some really high class and outstanding applications like the ones from Chrstian Spannagel (Math) or  Oliver Vornberger (Algorithms and data structures) have two or three times as many views on Youtube as votes. Especially they have about the same amount of views on Youtube as the top voted courses.

I am pretty sure there are some more interesting facts and maybe someone else has collected a better data set over the complete periode of time and including Youtube snapshots as well as Facebook and Twitter mentions.
Since I have been asked several times already: here are the final rankings to download and also as a table in the blog post:

  Kursname Anzahl an votes
1 sectio chirurgica anatomie interaktiv 8013
2 internationales agrarmanagement 2 7557
3 ingenieurmathematik fur jedermann 2669
4 harry potter and issues in international politics 2510
5 online surgery 2365
6 l3t s mooc der offene online kurs uber das lernen und lehren mit technologien 2270
7 design 101 or design basics 2 2216
8 einfuhrung in das sozial und gesundheitswesen sozialraume entdecken und entwickeln 2124
9 changeprojekte planen nachhaltige entwicklung durch social entrepreneurship 2083
10 social work open online course swooc14 2059
11 understanding sustainability environmental problems collective action and institutions 1912
12 the dance of functional programming languaging with haskell and python 1730
13 zyklenbasierte grundung systematische entwicklung von geschaftskonzepten 1698
14 a virtual living lab course for sustainable housing and lifestyle 1682
15 family politics domestic life revolution and dictatorships between 1900 1950 1476
16 h2o extrem 1307
17 dark matter in galaxies the last mystery 1261
18 algorithmen und datenstrukturen 1207
19 psychology of judgment and decision making 1168
20 the future of storytelling 1164
21 web engineering 1152
22 die autoritat der wissenschaften eine einfuhrung in das wissenschaftstheoretische denken 2 1143
23 magic and logic of music a comprehensive course on the foundations of music and its place in life 1138
24 nmooc nachhaltigkeit fur alle 1130
25 sovereign bond pricing 1115
26 soziale arbeit eine einfuhrung 1034
27 mathematische denk und arbeitsweisen in geometrie und arithmetik 1016
28 social entrepreneurship wir machen gesellschaftlichen wandel moglich 1010
29 molecular gastronomy an experimental lecture about food food processing and a bit of physiology 984
30 fundamentals of remote sensing for earth observation 920
31 kompetenzkurs ernahrungswissenschaft 891
32 erfolgreich studieren 879
33 deciphering ancient texts in the digital age 868
34 qualitative methods 861
35 karl der grosse pater europae 855
36 who am i mind consciousness and body between science and philosophy 837
37 programmieren mit java 835
38 systemisches projektmanagement 811
39 lernen ist sexy 764
40 modelling and simulation using matlab one mooc more brains an interdisciplinary course not just for experts 760
41 suchmaschinen verstehen 712
42 hands on course on embedded computing systems with raspberry pi 679
43 introduction to mixed methods and doing research online 676
44 game ai 649
45 game theory and experimental economic research 633
46 cooperative innovation 613
47 blue engineering ingenieurinnen und ingenieure mit sozialer und okologischer verantwortung 612
48 my car the unkown technical being 612
49 gesundheit ein besonderes gut eine multidisziplinare erkundung des deutschen gesundheitssystems 608
50 teaching english as a foreign language tefl part i pronunciation 597
51 wie kann lesen gelernt gelehrt und gefordert werden lesesozialisation lesedidaktik und leseforderung vom grundschulunterricht bis zur erwachsenenbildung 593
52 the european dream 576
53 education of the present what is the future of education 570
54 faszination kristalle und symmetrie 561
55 italy today a girlfriend in a coma a walk through today s italy 557
56 dna from structure to therapy 556
57 grundlagen der mensch computer interaktion 549
58 malnutrition in developing countries 548
59 marketing als strategischer erfolgsfaktor von der produktinnovation bis zur kundenbindung 540
60 environmental ethics for scientists 540
61 stem cells in biology and medicine 528
62 praxiswissen fur den kunstlerischen alltagsdschungel 509
63 physikvision 506
64 high five evidence based practice 505
65 future climate water 484
66 diversity and communication challenges for integration and mobility 477
67 social entrepreneurship 469
68 die kunst des argumentierens 466
69 der hont feat mit dem farat wek wie kinder schreiben und lesen lernen 455
70 antikrastination moocen gegen chronisches aufschieben 454
71 exercise for a healthier life 454
72 the startup source code 438
73 web science 435
74 medizinische immunologie 433
75 governance in and through human rights 431
76 europe in the world law and policy aspects of the eu in global governance 419
77 komplexe welt strukturen selbstorganisation und chaos 419
78 mooc basics of surgery want to become a real surgeon 416
79 statistical data analysis for the humanities 414
80 business math r edux 406
81 analyzing behavioral dynamics non linear approaches to social and cognitive sciences 402
82 space technology 397
83 der erzahler materialitat und virtualitat vom mittelalter bis zur gegenwart 396
84 kriminologie 395
85 von e mail skype und xing kommunikation fuhrung und berufliche zusammenarbeit im netz 394
86 wissenschaft erzahlen das phanomen der grenze 392
87 nachhaltige entwicklung 389
88 die nachste gesellschaft gesellschaft unter bedingungen der elektrizitat des computers und des internets 388
89 die grundrechte 376
90 medienbildung und mediendidaktik grundbegriffe und praxis 368
91 bubbles everywhere speculative bubbles in financial markets and in everyday life 364
92 the heart of creativity 363
93 physik und weltraum 358
94 sim suchmaschinenimplementierung als mooc 354
95 order of magnitude physics from atomic nuclei to the universe 350
96 entwurfsmethodik eingebetteter systeme 343
97 monte carlo methods in finance 335
98 texte professionell mit latex erstellen 331
99 wissenschaftlich arbeiten wissenschaftlich schreiben 330
100 e x cite join the game of social research 330
101 forschungsmethoden 323
102 complex problem solving 321
103 programmieren lernen mit effekt 317
104 molecular devices and machines 317
105 wie man erfolgreich ein startup aufbaut 315
106 grundlagen der prozeduralen und objektorientierten programmierung 314
107 introduction to disability studies 314
108 eu2c the european union explained by two partners cologne and cife 313
109 the english language a linguistic introduction 2 311
110 allgemeine betriebswirtschaftslehre 293
111 interaction design open design 293
112 how we learn nowadays possibilities and difficulties 288
113 foundations of educational technology 288
114 projektmanagement und designbasiertes lernen 281
115 human rights 278
116 kompetenz des horens technische gehorbildung 278
117 it infrastructure management 276
118 a media history in 10 artefacts 274
119 introduction to the practice of statistics and regression 271
120 what is a good society introduction to social philosophy 268
121 modellierungsmethoden in der wirtschaftsinformatik 265
122 objektorientierte programmierung von web anwendungen von anfang an 262
123 intercultural diversity networking vielfalt interkulturell vernetzen 260
124 foundations of entrepreneurship 259
125 business communication for impact and results 257
126 gamification 257
127 creativity and design in innovation management 256
128 mechanik i 252
129 global virtual project management 252
130 digital signal processing for everyone 249
131 kompetenzen fur klimaschutz anpassung 248
132 digital economy and social innovation 246
133 synthetic biology 245
134 english phonetics and phonology 245
135 leibspeisen nahrung im wandel der zeiten molekule brot kase fleisch schokolade und andere lebensmittel 243
136 critical decision making in the contemporary globalized world 238
137 einfuhrung in die allgemeine betriebswirtschaftslehre schwerpunkt organisation personalmanagement und unternehmensfuhrung 236
138 didaktisches design 235
139 an invitation to complex analysis 235
140 grundlagen der programmierung teil 1 234
141 allgemein und viszeralchirurgie 233
142 mathematik 1 fur ingenieure 231
143 consumption and identity you are what you buy 231
144 vampire fictions 230
145 grundlagen der anasthesiologie 228
146 marketing strategy and brand management 227
147 political economy an introduction 225
148 gesundheit 221
149 object oriented databases 219
150 lebenswelten perspektiven fur menschen mit demenz 217
151 applications of graphs to real life problems 210
152 introduction to epidemiology epimooc 207
153 network security 207
154 global civics 207
155 wissenschaftliches arbeiten 204
156 annaherungen an zukunfte wie lassen sich mogliche wahrscheinliche und wunschbare zukunfte bestimmen 202
157 einstieg wissenschaft 200
158 engineering english 199
159 das erklaren erklaren wie infografik klart erklart und wissen vermittelt 198
160 betriebswirtschaftliche und rechtliche grundlagen fur das nonprofit management 192
161 art and mathematics 191
162 vom phanomen zum modell mathematische modellierung von natur und alltag an ausgewahlten beispielen 190
163 design interaktiver medien technische grundlagen 189
164 business englisch 187
165 erziehung sehen analysieren gestalten 184
166 basic clinical research methods 184
167 ordinary differential equations and laplace transforms 180
168 mathematische logik 179
169 die geburt der materie in der evolution des universums 179
170 innovationsmanagement von kleinen und mittelstandischen unternehmen kmu 176
171 introduction to qualitative methods in the social sciences 175
172 advert retard wirkung industrieller interessen auf rationale arzneimitteltherapie 175
173 animation beyond the bouncing ball 174
174 entropie einfuhrung in die physikalische chemie 172
175 edufutur education for a sustainable future 165
176 social network effects on everyday life 164
177 pharmaskills for africa 163
178 nachhaltige energiewirtschaft 162
179 qualitat in der fruhpadagogik auf den anfang kommt es an 158
180 dementias 157
181 beyond armed confrontation multidisciplinary approaches and challenges from colombia s conflict 154
182 investition und finanzierung 150
183 praxis des wissensmanagements 149
184 gutenberg to google the social construction of the communciations revolution 145
185 value innovation and blue oceans 145
186 kontrapunkt 144
187 shakespeare s politics 142
188 jetzt erst recht wissen schaffen uber recht 141
189 rechtliche probleme von sozialen netzwerken 138
190 augmented tuesday suppers 137
191 positive padagogik 137
192 digital storytelling mit bewegenden bildern erzahlen 136
193 wirtschaftsethik 134
194 energieeffizientes bauen 134
195 advising startups 133
196 urban design and communication 133
197 bildungsreform 2 0 132
198 mooc management basics 130
199 healthy teeth a life long course of preventive dentistry 129
200 digitales tourismus marketing 127
201 the arctic game the struggle for control over the melting ice 127
202 disease mechanisms 127
203 special operations from raids to drones 125
204 introduction to geospatial technology 120
205 social media marketing strategy smms 119
206 korpusbasierte analyse sprechsprachlichen problemlosungsverhaltens 116
207 introduction to marketing 115
208 creative coding 114
209 mooc meets 3d 110
210 unternehmenswert die einzig sinnvolle spitzenkennzahl fur unternehmen 110
211 forming behaviour gestaltung und konzeption von web applications 109
212 technology demonstration 108
213 lebensmittelmikrobiologie und hygiene 105
214 estudi erfolgreich studieren mit dem internet 105
215 moderne geldtheorie eine paische perspektive 103
216 kollektive intelligenz 103
217 geschichte der optischen medien 100
218 alter und soziale arbeit 99
219 semantik eine theorie visueller kommunikation 97
220 erziehung und beratung in familie und schule 96
221 foreign language learning in indian context 95
222 bildgebende verfahren 92
223 applied biology 92
224 bildung in der wissensgesellschaft gerechtigkeit 92
225 standortmanagement 92
226 europe a solution from history 90
227 methodology of research in international law 90
228 when african americans came to paris 90
229 contemporary architecture 89
230 past recent encounters turkey and germany 88
231 wars to end all wars 83
232 online learning management systems 82
233 software applications 81
234 business in germany 78
235 requirements engineering 77
236 anything relationship management xrm 77
237 global standards and local practices 76
238 prodima professionalisation of disaster medicine and management 75
239 cytology with a virtual correlative light and electron microscope 75
240 the organisation of innovation 75
241 sensors for all 75
242 diagnostik in der beruflichen bildung 73
243 scientific working 71
244 escience saxony lectures 71
245 internet marketing strategy how to gain influence and spread your message online 69
246 grundlagen des e business 69
247 principles of public health 64
248 methods for shear wave velocity measurements in urban areas 64
249 democracy in america 64
250 building typology studies gebaudelehre 63
251 multi media based learning environments at the interface of science and practice hamburg university of applied sciences prof dr andrea berger klein 61
252 math mooc challenge 60
253 the value of the social 58
254 dienstleistungsmanagement und informationssysteme 57
255 ict integration in education systems e readiness e integration e transformation 56
]]>
https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/feed/ 8
Building an Autocompletion on GWT screencast Part 2: Invoking The Remote Procedure Call https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-2-invoking-the-remote-procedure-call/ https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-2-invoking-the-remote-procedure-call/#comments Tue, 12 Mar 2013 07:25:00 +0000 http://www.rene-pickhardt.de/?p=1544 Hey everyone after posting my first screencast in this series reviewing the basic process for creating remote procedure calls in GWT we are now finally starting with the real tutorial for building an autocomplete service.
This tutorial (again hosted on wikipedia) covers the basic user interface meaning

  • how to integreate a SuggestBox instead of a textfield into the GWT Starter project
  • how to set up the neccessary stuff (extending a SuggestOracle) to fire a remote procedure call that requests suggestions if the user has typed something.
  • how to override the necessary methods from the SuggestOracle Interface

So here we go with the second part of the screencast which you can of course directly download from wikipedia:

Feel free to ask questions, give comments and improve the screencast!

]]>
https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-2-invoking-the-remote-procedure-call/feed/ 2
Building an Autocompletion on GWT screencast Part 1: Getting Warm – Reviewing remote procedure calls https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-1-getting-warm-reviewing-remote-procedure-calls/ https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-1-getting-warm-reviewing-remote-procedure-calls/#comments Tue, 19 Feb 2013 09:11:29 +0000 http://www.rene-pickhardt.de/?p=1539 Quite a while ago I promised to create some screencasts on how to build a (personalized) Autocompletion in GWT. Even though the screencasts have been created for quite some time now I had to wait publishing them for various reasons.
Finally it is now the time to go public with the first video. I do really start from scratch. So the first video might be a little bit boaring since I am only reviewing the Remote Procedure calls of GWT.
A litte Note: The video is hosted on Wikipedia! I think it is important to spread knowledge under a creative commons licence and the youtubes, vimeos,… of this world are rather trying to do a vendor lock in. So If the embedded player is not so well you can go directly to wikipedia for a fullscreen version or direct download of the video.

Another note: I did not publish the source code! This has a pretty simple reason (and yes you can call me crazy): If you really want to learn something, copying and pasting code doesn’t help you to get the full understanding. Doing it step by step e.g. watching the screencasts and reproducing the steps is the way to go.
As always I am open to suggestions and feedback but please have in mind that the entire course of videos is already recorded.

]]>
https://www.rene-pickhardt.de/building-an-autocompletion-on-gwt-screencast-part-1-getting-warm-reviewing-remote-procedure-calls/feed/ 4
Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language https://www.rene-pickhardt.de/get-the-full-neo4j-power-by-using-the-core-java-api-for-traversing-your-graph-data-base-instead-of-cypher-query-language/ https://www.rene-pickhardt.de/get-the-full-neo4j-power-by-using-the-core-java-api-for-traversing-your-graph-data-base-instead-of-cypher-query-language/#comments Tue, 06 Nov 2012 11:55:02 +0000 http://www.rene-pickhardt.de/?p=1460 As I said yesterday I have been busy over the last months producing content so here you go. For related work we are most likely to use neo4j as core data base. This makes sense since we are basically building some kind of a social network. Most queries that we need to answer while offering the service or during data mining carry a friend of a friend structure.
For some of the queries we are doing counting or aggregations so I was wondering what is the most efficient way of querying against a neo4j data base. So I did a Benchmark with quite surprising results.
Just a quick remark, we used a data base consisting of papers and authors extracted from arxiv.org one of the biggest pre print sites available on the web. The data set is available for download and reproduction of the benchmark results at http://blog.related-work.net/data/
The data base as a neo4j file is 2GB (zipped) the schema looks pretty much like that:

 Paper1  <--[ref]-->  Paper2
   |                    |
   |[author]            |[author]
   v                    v
 Author1              Author2

For the benchmark we where trying to find coauthors which is basically a friend of a friend query following the author relationship (or breadth first search (depth 2))
As we know there are basically 3 ways of communicating with the neo4j Database:

Java Core API

Here you work on the nodes and relationship objects within java. Formulating a query once you have fixed an author node looks pretty much like this.

for (Relationship rel: author.getRelationships(RelationshipTypes.AUTHOROF)){
Node paper = rel.getOtherNode(author);
for (Relationship coAuthorRel: paper.getRelationships(RelationshipTypes.AUTHOROF)){
Node coAuthor = coAuthorRel.getOtherNode(paper);
if (coAuthor.getId()==author.getId())continue;
resCnt++;
}
}

We see that the code can easily look very confusing (if queries are getting more complicated). On the other hand one can easy combine several similar traversals into one big query making readability worse but increasing performance.

Traverser Framework

The Traverser Framework ships with the Java API and I really like the idea of it. I think it is really easy to undestand the meaning of a query and in my opinion it really helps to create a good readability of the code.

Traversal t = new Traversal();
for (Path p:t.description().breadthFirst().
relationships(RelationshipTypes.AUTHOROF).evaluator(Evaluators.atDepth(2)).
uniqueness(Uniqueness.NONE).traverse(author)){
Node coAuthor = p.endNode();
resCnt++;
}

Especially if you have a lot of similar queries or queries that are refinements of other queries you can save them and extend them using the Traverser Framework. What a cool technique.

Cypher Query Language

And then there is Cypher Query language. An interface pushed a lot by neo4j. If you look at the query you can totally understand why. It is a really beautiful language that is close to SQL (Looking at Stackoverflow it is actually frightening how many people are trying to answer Foaf queries using MySQL) but still emphasizes on the graph like structure.

ExecutionEngine engine = new ExecutionEngine( graphDB );
String query = "START author=node("+author.getId()+
") MATCH author-[:"+RelationshipTypes.AUTHOROF.name()+
"]-()-[:"+RelationshipTypes.AUTHOROF.name()+
"]- coAuthor RETURN coAuthor";
ExecutionResult result = engine.execute( query);
scala.collection.Iterator it = result.columnAs("coAuthor");
while (it.hasNext()){
Node coAuthor = it.next();
resCnt++;
}
I was always wondering about the performance of this Query language. Writing a Query language is a very complex task and the more expressive the language is the harder it is to achieve good performance (same holds true for SPARQL in the semantic web) And lets just point out Cypher is quite expressive.

What where the results?

All queries have been executed 11 times where the first time was thrown away since it warms up neo4j caches. The values are average values over the other 10 executions.
  • The Core API is able to answer about 2000 friend of a friend queries (I have to admit on a very sparse network).
  • The Traverser framework is about 25% slower than the Core API
  • Worst is cypher which is slower at least one order of magnitude only able to answer about 100 FOAF like queries per second.
  • I was shocked so I talked with Andres Taylor from neo4j who is mainly working for cypher. He asked my which neo4j version I used and I said it was 1.7. He told me I should check out 1.9. since Cypher has become more performant. So I run the benchmarks over neo4j 1.8 and neo4j 1.9 unfortunately Cypher became slower in newer neo4j releases.

    One can see That the Core API outperforms Cypher by an order of magnitute and the Traverser Framework by about 25%. In newer neo4j versions The core API became faster and cypher became slower

    Quotes from Andres Taylor:

    Cypher is just over a year old. Since we are very constrained on developers, we have had to be very picky about what we work on the focus in this first phase has been to explore the language, and learn about how our users use the query language, and to expand the feature set to a reasonable level

    I believe that Cypher is our future API. I know you can very easily outperform Cypher by handwriting queries. like every language ever created, in the beginning you can always do better than the compiler by writing by hand but eventually,the compiler catches up

    Conclusion:

    So far I was only using the Java Core API working with neo4j and I will continue to do so.
    If you are in a high speed scenario (I believe every web application is one) you should really think about switching to the neo4j Java core API for writing your queries. It might not be as nice looking as Cypher or the traverser Framework but the gain in speed pays off.
    Also I personally like the amount of control that you have when traversing over the core yourself.
    Adittionally I will soon post an article why scripting languages like PHP, Python ore Ruby aren’t suitable for building web Applications anyway. So changing to the core API makes even sense for several reasons.
    The complete source code of the benchmark can be found at https://github.com/renepickhardt/related-work.net/blob/master/RelatedWork/src/net/relatedwork/server/neo4jHelper/benchmarks/FriendOfAFriendQueryBenchmark.java (commit: 0d73a2e6fc41177f3249f773f7e96278c1b56610)
    The detailed results can be found in this spreadsheet.

    ]]> https://www.rene-pickhardt.de/get-the-full-neo4j-power-by-using-the-core-java-api-for-traversing-your-graph-data-base-instead-of-cypher-query-language/feed/ 16 Typology Oberseminar talk and Speed up of retrieval by a factor of 1000 https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/ https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/#comments Thu, 16 Aug 2012 11:39:25 +0000 http://www.rene-pickhardt.de/?p=1396 Almost 2 months ago I talked in our oberseminar about Typology. Update: Download slides Most readers of my blog will already know the project which was initially implemented by my students Till and Paul. I am just about to share some slides with you. They explain on one hand how the systems works and on the other hand give some overview of the related work.
    As you can see from the slides we are planning to submit our results to SIGIR conference. So one year after my first blogpost on graphity which devoloped in a full paper for socialcom2012 (graphity blog post and blog post for source code) there is the yet informal typology blog post with the slides about the Typology Oberseminar talk and 3 months left for our SIGIR submission. I expect this time the submission will not be such a hassle as graphity since I shuold have learnt some lessons and also have a good student who is helping me with the implementation of all the tests.
    Additionally I have finally uploaded some source code to git hub that makes the typology retrieval algorithm pretty fast. There are still some issues with this code since it lowers the quality of predictions a little bit. Also the index has to be built first. Last but not least the original SuggestTree code did not save the weights of the items to be suggested. I need those weights in the aggregation phase. Since i did not want to extend the original code I placed the weights at the end of the suggested Items. This is a little inefficent.
    The main idea why retrieval speeds up with the new algorithm is that typology needs to make sorting over all outedges of a node. This is rather slow especially if one only needs the top k elements. Since neo4j as a graph data base does not provide indices for this kind of data I was forced to look for another way to presort the data. Additionally if a prefix is known one does not have to look at all outgoing edges. I found the Suggest Tree class by Nicolai Diethelm. Which solved the problem in a very good way and lead to such a great speed. The index is not persistent yet and it also needs quite some memory. On the other hand for every node a suggest tree is built. This means that the index can be distributed in a very easy manner over several machines allowing for horizontal scaling!
    Anyway the old algorithm was only able to handle like 20 requests per second and now we have something like 14 k requests and as I mentioned there is still a little space for more (:
    I hope indices like this will be standard in neo4j soon. This would open up the range of applications that could make good use of neo4j.
    Like always I am happy for any suggestions and I am looking forward to do the complete evaluation and paper writing for typology.

    ]]>
    https://www.rene-pickhardt.de/typology-oberseminar-talk-and-speed-up-of-retrieval-by-a-factor-of-1000/feed/ 2
    Neo4j Graph Database vs MySQL https://www.rene-pickhardt.de/neo4j-graph-database-vs-mysql/ https://www.rene-pickhardt.de/neo4j-graph-database-vs-mysql/#comments Thu, 05 May 2011 21:36:32 +0000 http://www.rene-pickhardt.de/?p=355 For my social news stream application I am heavily thinking about the right software to support my backend. After I designed a database model in MySQL I talked back to Jonas and he suggested to search for a better suiting technology. A little bit of research brought me to a Graph database called Neo4j.
    After I downloaded the opensource java libs and got it running with eclipse and maven I did some testing with my Metalcon data set. And I have been very satisfied and the whole project looks very promesing to me. I exported 4 relations from my MySQL Database.

    1. UserUserFriend containing all the friendship requests
    2. UserProfileVisit containing the profiles a user visited
    3. UserMessage containing the messages between users
    4. UserComment containing the profile comments between users

    These relations obviously form a graph on my data set. Reading the several 100’000 lines of data and put them into the graph data structure and building a search index on the nodes only took several seconds runtime. But I was even more impressed by the speed with which it was possible to traverse the graph!
    Receiving the shortest path between two users of length 4 only took me 150 milliseconds. Doing a full bredthfirst search on a different heavily connected graph with 290’000 edges only took 2.7 seconds which means that neo4j is capable of traversing about 100’000 edges per second.
    Now I will have to look more carefully to my usecase. Obviously I want to have edges that are labled with timestamps and retrieve them in orderd lists. Adding key value pairs to the edges and including and index is possible which makes me optimisitic that I will be able to solve a lot of my queries of interest in an efficiant manner.
    Unfortunately I am batteling around with Google Webtoolkit and Eclipse and Neo4j which I want to combine for the new metlcon version but I even asked the neo4j mailinglist with an very emberassing question and the guys from neotechnology have been very kind and helpful (even thogh I still couldn’t manage to get it running) I will post an article here as soon as I know how to set everything up.
    In General I am still a huge fan of relational databases but for a usecase of social networks I see why graph data bases seem to be the more sophisticated technology. I am pretty sure that I could not have perfomed so well using MySQL.
    What is your experience with graph data bases and especially neo4j?

    ]]>
    https://www.rene-pickhardt.de/neo4j-graph-database-vs-mysql/feed/ 5