open data – Data Science, Data Analytics and Machine Learning Consulting in Koblenz Germany https://www.rene-pickhardt.de Extract knowledge from your data and be ahead of your competition Tue, 17 Jul 2018 12:12:43 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.6 Analyzing the final and intermediate results of the iversity MOOC Fellowship online voting https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/ https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/#comments Thu, 23 May 2013 23:07:24 +0000 http://www.rene-pickhardt.de/?p=1609 As writen before Steffen and I participated in the online voting for the MOOC fellowship. Today the competition finished and I would like to say thank you to everyone who so far participated in the voting in particular to the 435 people supporting our course. I did never image to get that many people to be interested in our course!
The voting period went from May first till today. During this period the user interface of the iversity website changed several times providing different kind of information about the voting to us users. Since I have observed a drastic change in rankings on May 9th and since the process and scores have not been very transparent I have decided on that very day to collect some data about the rankings. I already did some quick analysis on the data and found some interesting facts but I am running out of time right now to conduct an extensive data analysis. So I will share the data set with the public domain:
http://rene-pickhardt.de/mooc.tar.bz2 (33MB)
If you download the zip file and extract it you’ll find folders for every hour after May 9th. In every folder you will find 26 html-files representing the current ranking of the courses at that time and a transaction log of the http-requests which were done to download the 26 html files. There are 26 html files since 10 courses were displayed per page and we had 255 courses participating.
During the time of data collection I had 2 or 3 short down times of my web server so it could be possible that some data points are missing.
I already wrote a “dirty hack” and pushed it on github which also extracts the interesting information out of the downloaded html files.

  1. There is a file rank.tsv (334 kb) that displays for every course on an hourly basis the rankings
  2. There is a file vote.tsv (113 kb) that contains for every course on an hourly basis (between may 20th and today) the number of votes the course did acquire. The period of time for vote.tsv is so short since the votes have only been available in the html files during this time. 

Skimming the data with my eyes there are already some facts that make me very curious for a deeper data analysis:

  1. Some courses gained several hundred votes within a short period of time (usually only 2 or 3 hours) whereas most courses (especially those gaining such a large amount of votes) often stayed far under 1000 votes at all. 
  2. Also it is interesting to see how much variation has been going on in the last couple of days. 
  3. Also I haven’t crawled the views of the Youtube videos of the courses and even now after observing the following I did not take a snapshot of them it is interesting that there is such a large difference in conversion rate. Especially the top courses seem to have much more votes than they have views of the application video. Where some really high class and outstanding applications like the ones from Chrstian Spannagel (Math) or  Oliver Vornberger (Algorithms and data structures) have two or three times as many views on Youtube as votes. Especially they have about the same amount of views on Youtube as the top voted courses.

I am pretty sure there are some more interesting facts and maybe someone else has collected a better data set over the complete periode of time and including Youtube snapshots as well as Facebook and Twitter mentions.
Since I have been asked several times already: here are the final rankings to download and also as a table in the blog post:

  Kursname Anzahl an votes
1 sectio chirurgica anatomie interaktiv 8013
2 internationales agrarmanagement 2 7557
3 ingenieurmathematik fur jedermann 2669
4 harry potter and issues in international politics 2510
5 online surgery 2365
6 l3t s mooc der offene online kurs uber das lernen und lehren mit technologien 2270
7 design 101 or design basics 2 2216
8 einfuhrung in das sozial und gesundheitswesen sozialraume entdecken und entwickeln 2124
9 changeprojekte planen nachhaltige entwicklung durch social entrepreneurship 2083
10 social work open online course swooc14 2059
11 understanding sustainability environmental problems collective action and institutions 1912
12 the dance of functional programming languaging with haskell and python 1730
13 zyklenbasierte grundung systematische entwicklung von geschaftskonzepten 1698
14 a virtual living lab course for sustainable housing and lifestyle 1682
15 family politics domestic life revolution and dictatorships between 1900 1950 1476
16 h2o extrem 1307
17 dark matter in galaxies the last mystery 1261
18 algorithmen und datenstrukturen 1207
19 psychology of judgment and decision making 1168
20 the future of storytelling 1164
21 web engineering 1152
22 die autoritat der wissenschaften eine einfuhrung in das wissenschaftstheoretische denken 2 1143
23 magic and logic of music a comprehensive course on the foundations of music and its place in life 1138
24 nmooc nachhaltigkeit fur alle 1130
25 sovereign bond pricing 1115
26 soziale arbeit eine einfuhrung 1034
27 mathematische denk und arbeitsweisen in geometrie und arithmetik 1016
28 social entrepreneurship wir machen gesellschaftlichen wandel moglich 1010
29 molecular gastronomy an experimental lecture about food food processing and a bit of physiology 984
30 fundamentals of remote sensing for earth observation 920
31 kompetenzkurs ernahrungswissenschaft 891
32 erfolgreich studieren 879
33 deciphering ancient texts in the digital age 868
34 qualitative methods 861
35 karl der grosse pater europae 855
36 who am i mind consciousness and body between science and philosophy 837
37 programmieren mit java 835
38 systemisches projektmanagement 811
39 lernen ist sexy 764
40 modelling and simulation using matlab one mooc more brains an interdisciplinary course not just for experts 760
41 suchmaschinen verstehen 712
42 hands on course on embedded computing systems with raspberry pi 679
43 introduction to mixed methods and doing research online 676
44 game ai 649
45 game theory and experimental economic research 633
46 cooperative innovation 613
47 blue engineering ingenieurinnen und ingenieure mit sozialer und okologischer verantwortung 612
48 my car the unkown technical being 612
49 gesundheit ein besonderes gut eine multidisziplinare erkundung des deutschen gesundheitssystems 608
50 teaching english as a foreign language tefl part i pronunciation 597
51 wie kann lesen gelernt gelehrt und gefordert werden lesesozialisation lesedidaktik und leseforderung vom grundschulunterricht bis zur erwachsenenbildung 593
52 the european dream 576
53 education of the present what is the future of education 570
54 faszination kristalle und symmetrie 561
55 italy today a girlfriend in a coma a walk through today s italy 557
56 dna from structure to therapy 556
57 grundlagen der mensch computer interaktion 549
58 malnutrition in developing countries 548
59 marketing als strategischer erfolgsfaktor von der produktinnovation bis zur kundenbindung 540
60 environmental ethics for scientists 540
61 stem cells in biology and medicine 528
62 praxiswissen fur den kunstlerischen alltagsdschungel 509
63 physikvision 506
64 high five evidence based practice 505
65 future climate water 484
66 diversity and communication challenges for integration and mobility 477
67 social entrepreneurship 469
68 die kunst des argumentierens 466
69 der hont feat mit dem farat wek wie kinder schreiben und lesen lernen 455
70 antikrastination moocen gegen chronisches aufschieben 454
71 exercise for a healthier life 454
72 the startup source code 438
73 web science 435
74 medizinische immunologie 433
75 governance in and through human rights 431
76 europe in the world law and policy aspects of the eu in global governance 419
77 komplexe welt strukturen selbstorganisation und chaos 419
78 mooc basics of surgery want to become a real surgeon 416
79 statistical data analysis for the humanities 414
80 business math r edux 406
81 analyzing behavioral dynamics non linear approaches to social and cognitive sciences 402
82 space technology 397
83 der erzahler materialitat und virtualitat vom mittelalter bis zur gegenwart 396
84 kriminologie 395
85 von e mail skype und xing kommunikation fuhrung und berufliche zusammenarbeit im netz 394
86 wissenschaft erzahlen das phanomen der grenze 392
87 nachhaltige entwicklung 389
88 die nachste gesellschaft gesellschaft unter bedingungen der elektrizitat des computers und des internets 388
89 die grundrechte 376
90 medienbildung und mediendidaktik grundbegriffe und praxis 368
91 bubbles everywhere speculative bubbles in financial markets and in everyday life 364
92 the heart of creativity 363
93 physik und weltraum 358
94 sim suchmaschinenimplementierung als mooc 354
95 order of magnitude physics from atomic nuclei to the universe 350
96 entwurfsmethodik eingebetteter systeme 343
97 monte carlo methods in finance 335
98 texte professionell mit latex erstellen 331
99 wissenschaftlich arbeiten wissenschaftlich schreiben 330
100 e x cite join the game of social research 330
101 forschungsmethoden 323
102 complex problem solving 321
103 programmieren lernen mit effekt 317
104 molecular devices and machines 317
105 wie man erfolgreich ein startup aufbaut 315
106 grundlagen der prozeduralen und objektorientierten programmierung 314
107 introduction to disability studies 314
108 eu2c the european union explained by two partners cologne and cife 313
109 the english language a linguistic introduction 2 311
110 allgemeine betriebswirtschaftslehre 293
111 interaction design open design 293
112 how we learn nowadays possibilities and difficulties 288
113 foundations of educational technology 288
114 projektmanagement und designbasiertes lernen 281
115 human rights 278
116 kompetenz des horens technische gehorbildung 278
117 it infrastructure management 276
118 a media history in 10 artefacts 274
119 introduction to the practice of statistics and regression 271
120 what is a good society introduction to social philosophy 268
121 modellierungsmethoden in der wirtschaftsinformatik 265
122 objektorientierte programmierung von web anwendungen von anfang an 262
123 intercultural diversity networking vielfalt interkulturell vernetzen 260
124 foundations of entrepreneurship 259
125 business communication for impact and results 257
126 gamification 257
127 creativity and design in innovation management 256
128 mechanik i 252
129 global virtual project management 252
130 digital signal processing for everyone 249
131 kompetenzen fur klimaschutz anpassung 248
132 digital economy and social innovation 246
133 synthetic biology 245
134 english phonetics and phonology 245
135 leibspeisen nahrung im wandel der zeiten molekule brot kase fleisch schokolade und andere lebensmittel 243
136 critical decision making in the contemporary globalized world 238
137 einfuhrung in die allgemeine betriebswirtschaftslehre schwerpunkt organisation personalmanagement und unternehmensfuhrung 236
138 didaktisches design 235
139 an invitation to complex analysis 235
140 grundlagen der programmierung teil 1 234
141 allgemein und viszeralchirurgie 233
142 mathematik 1 fur ingenieure 231
143 consumption and identity you are what you buy 231
144 vampire fictions 230
145 grundlagen der anasthesiologie 228
146 marketing strategy and brand management 227
147 political economy an introduction 225
148 gesundheit 221
149 object oriented databases 219
150 lebenswelten perspektiven fur menschen mit demenz 217
151 applications of graphs to real life problems 210
152 introduction to epidemiology epimooc 207
153 network security 207
154 global civics 207
155 wissenschaftliches arbeiten 204
156 annaherungen an zukunfte wie lassen sich mogliche wahrscheinliche und wunschbare zukunfte bestimmen 202
157 einstieg wissenschaft 200
158 engineering english 199
159 das erklaren erklaren wie infografik klart erklart und wissen vermittelt 198
160 betriebswirtschaftliche und rechtliche grundlagen fur das nonprofit management 192
161 art and mathematics 191
162 vom phanomen zum modell mathematische modellierung von natur und alltag an ausgewahlten beispielen 190
163 design interaktiver medien technische grundlagen 189
164 business englisch 187
165 erziehung sehen analysieren gestalten 184
166 basic clinical research methods 184
167 ordinary differential equations and laplace transforms 180
168 mathematische logik 179
169 die geburt der materie in der evolution des universums 179
170 innovationsmanagement von kleinen und mittelstandischen unternehmen kmu 176
171 introduction to qualitative methods in the social sciences 175
172 advert retard wirkung industrieller interessen auf rationale arzneimitteltherapie 175
173 animation beyond the bouncing ball 174
174 entropie einfuhrung in die physikalische chemie 172
175 edufutur education for a sustainable future 165
176 social network effects on everyday life 164
177 pharmaskills for africa 163
178 nachhaltige energiewirtschaft 162
179 qualitat in der fruhpadagogik auf den anfang kommt es an 158
180 dementias 157
181 beyond armed confrontation multidisciplinary approaches and challenges from colombia s conflict 154
182 investition und finanzierung 150
183 praxis des wissensmanagements 149
184 gutenberg to google the social construction of the communciations revolution 145
185 value innovation and blue oceans 145
186 kontrapunkt 144
187 shakespeare s politics 142
188 jetzt erst recht wissen schaffen uber recht 141
189 rechtliche probleme von sozialen netzwerken 138
190 augmented tuesday suppers 137
191 positive padagogik 137
192 digital storytelling mit bewegenden bildern erzahlen 136
193 wirtschaftsethik 134
194 energieeffizientes bauen 134
195 advising startups 133
196 urban design and communication 133
197 bildungsreform 2 0 132
198 mooc management basics 130
199 healthy teeth a life long course of preventive dentistry 129
200 digitales tourismus marketing 127
201 the arctic game the struggle for control over the melting ice 127
202 disease mechanisms 127
203 special operations from raids to drones 125
204 introduction to geospatial technology 120
205 social media marketing strategy smms 119
206 korpusbasierte analyse sprechsprachlichen problemlosungsverhaltens 116
207 introduction to marketing 115
208 creative coding 114
209 mooc meets 3d 110
210 unternehmenswert die einzig sinnvolle spitzenkennzahl fur unternehmen 110
211 forming behaviour gestaltung und konzeption von web applications 109
212 technology demonstration 108
213 lebensmittelmikrobiologie und hygiene 105
214 estudi erfolgreich studieren mit dem internet 105
215 moderne geldtheorie eine paische perspektive 103
216 kollektive intelligenz 103
217 geschichte der optischen medien 100
218 alter und soziale arbeit 99
219 semantik eine theorie visueller kommunikation 97
220 erziehung und beratung in familie und schule 96
221 foreign language learning in indian context 95
222 bildgebende verfahren 92
223 applied biology 92
224 bildung in der wissensgesellschaft gerechtigkeit 92
225 standortmanagement 92
226 europe a solution from history 90
227 methodology of research in international law 90
228 when african americans came to paris 90
229 contemporary architecture 89
230 past recent encounters turkey and germany 88
231 wars to end all wars 83
232 online learning management systems 82
233 software applications 81
234 business in germany 78
235 requirements engineering 77
236 anything relationship management xrm 77
237 global standards and local practices 76
238 prodima professionalisation of disaster medicine and management 75
239 cytology with a virtual correlative light and electron microscope 75
240 the organisation of innovation 75
241 sensors for all 75
242 diagnostik in der beruflichen bildung 73
243 scientific working 71
244 escience saxony lectures 71
245 internet marketing strategy how to gain influence and spread your message online 69
246 grundlagen des e business 69
247 principles of public health 64
248 methods for shear wave velocity measurements in urban areas 64
249 democracy in america 64
250 building typology studies gebaudelehre 63
251 multi media based learning environments at the interface of science and practice hamburg university of applied sciences prof dr andrea berger klein 61
252 math mooc challenge 60
253 the value of the social 58
254 dienstleistungsmanagement und informationssysteme 57
255 ict integration in education systems e readiness e integration e transformation 56
]]>
https://www.rene-pickhardt.de/analyzing-the-final-and-intermediate-results-of-the-iversity-mooc-fellowship-online-voting/feed/ 8
Slides of Related work application presented in the Graphdevroom at FOSDEM https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/ https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/#comments Sat, 02 Feb 2013 15:13:02 +0000 http://www.rene-pickhardt.de/?p=1530 Download the slidedeck of our talk at fosdem 2013 including all the resources that we pointed to.
Most important other links are:

was great talking here and again we are open source, open data and so on. So if you have suggestions or want to contribute feel free. Just do it or contact us. We are really looking forward to meet some other hackers that just want to go geek and change the world
the video of the talk can be found here:

]]>
https://www.rene-pickhardt.de/slides-of-related-work-application-presented-in-the-graphdevroom-at-fosdem/feed/ 6
Big step towards open access by Great Britain and a comment from Neelie Kroes https://www.rene-pickhardt.de/big-step-towards-open-access-by-great-britain-and-a-comment-from-neelie-kroes/ https://www.rene-pickhardt.de/big-step-towards-open-access-by-great-britain-and-a-comment-from-neelie-kroes/#respond Sun, 19 Aug 2012 18:33:39 +0000 http://www.rene-pickhardt.de/?p=1407 During my vaccation a lot of stuff has been happened and it was just for today that I came along the following article and discussion: http://royalsociety.org/policy/projects/science-public-enterprise/report/. Yes you read correctly the royal society wants to create open access to all publications financed by the British government. What a big step! Congratulation to all British people for being such a role model.
It fits perfectly to my project related work and other discussions I was joining e.g.

Even though this development is very good to see I am not happy about how the following discussion is going on about models how to fulfill the goals from the royal society.

Neelie Kroes from the European comission posted a really nice answer!

I am glad to see this step forward. After my successful submission of Graphity and reading the copyright form of IEEE which I had to sign I really did have concerns publishing my work with them.
I am still considering not submitting to big journals and conferences anymore but just publishing on my universities website, my blog and/or on open preprint archives.

]]>
https://www.rene-pickhardt.de/big-step-towards-open-access-by-great-britain-and-a-comment-from-neelie-kroes/feed/ 0
Keynote on www2012 by Sir Tim Berners Lee https://www.rene-pickhardt.de/keynote-on-www2012-by-sir-tim-berners-lee/ https://www.rene-pickhardt.de/keynote-on-www2012-by-sir-tim-berners-lee/#comments Wed, 18 Apr 2012 10:30:12 +0000 http://www.rene-pickhardt.de/?p=1244 disclaimer: this is a very sloppy summary of the keynote speech of Sir Tim Berners Lee. It is neither spellchecked it was taking as notes while sitting inside. I hope I find the time after www2012 to go over it and improve it.

take aways

Decentralized design that is very important. and if you talk about the centralized system like DNS vs internet there are really social issiues that you are talking about.
look at the way w3c builds standards it is hard. It is a huge organizsation and they have to work together but also be split into modular groups.

IDEA 1 Mobile web apps

he gave a lot of interesting insights about the freedom of programmers and the limits of programing models. (Closely related to my thoughts on query languages for graph data bases) He says the easier the programming model the more you restrict people the more they can achieve.
But he also points out there is an ongoing battle between being universal and just being a tool like a refrigerator. he didn’t say it directly but if you listented carefully to him he says don’t support closed blocked systems like apple or even android but make it a open html5 javascript webapp. You decide weather the web stays open. I think he really sees how the mobile web is beeing more and more closed.
He warns: Make open mobile web apps. Join the working groups for setting the standards…
takeaway: VERY VERY important and interesting part of his speech! especially for typology.
My Question: “As a company building something. if it is open you don’t get as much benefit (like access to the friendship graph) as in the case of going for android / iphone systems…”

IDEA2 Standards centralized vs decentralized – power

He talked about the importance of having the same standards for the low level like html, http, and so on. He compares this to people not speeaking the same language. He says it is cool to build all the cool stuff ontop of it.
He says it is hard to build the standards but also important because it made it possible for the web to scale and win against goopher and other things from the early web.
he says the standards are great because anyone can build something on it. it contributes to be decentralized. You don’t have to ask anyone to build something.
My Question: “SPDY seems better / faster and more mature than http. but of course it is hard to change a running system. ”
Value of being completely independend (decentralized) vs the value of working together in a common way (centralized) and this is a huge fight.

IDEA 3 Trust

Ananomous are not quite sure what they are fighing for. is it complete anarchie or fight against corruption. is it the fight against certain goverments or against goverments in general. again. central vs decentral.
Very technical part of his talk that I unfortunatly could not really follow. The main idea was clear. It was about trust, the role of social networks and the methods that could build and propagate trust.
He hopes decentralized trust systems can be found but the battle is not over. (Remark from me: Of course Google+ is better than facebook but both systems are centralized I guess there needs to be low level standards like http to olve the problem but they are hard to implement.) He already said that the semantic web people tried to do it but haven’t successed yet.

IDEA4 openess especially in UK

he talks a lot about open goverment data.
“I spend a lot of time to governments talking to governments to publish data on the web. But a lot of people are pushing back trying to hold back data an maintain power.”
he says it is your responsability to ask parties before the election to have a commitment on making government data open.
open licences. Companies complain about open licenses since they want to use it but not open up their own data.
As a member of the pirate party I can only emphisize on his statements!

Idea 5 Privacy

hes says there are three different forms of privacy:
1.) Of ourse having the shop remembering me and my shoe size is common practise and nice for me. That is one way of privacy. of course those companies if they want to have a good relation with me will not give away my data.
2.) you never know what data will be out there in the future. It might be possible that anonymous data now will become transparent through other data sets coming up in the future…
3.) invasion / tracking / sells to highest bidder / or to the goverment or whoever asks. In the first days of the web this was hard routers couldn’t do this but now it is impossible. But in my oppinion tbl was talking about the facebook’s and googles of this world.
he says this is as dynamite. If someone wants to use this data against you. you will be toasted.
And you cannot stop this. because there are institutions that collet this information and once it is collected it is not save by definition.
he basically says that you cannot collect data about users by default unless you have a similar powershare of “executive legislative and judicative” if this is not happening we should not allow anyone to make such data tracking.
We have to spend 90% time to do cool stuff on the web be innovative. but 10% of the time we have to deal with theses issues otherwise the web will be locked down at some time.
It is about blocking, spamming, twitter bombing, data stealing and so own. This is a huge challange and a big risk of our open web! It is not only the open market depending on it but also democracy and human discourse that depends on it. “I call that net neutrality”
“cispn” americans do this in trying to controll the web and close communication…. go out and look for this. This things happen quick go out and defend the internet and fight for it. This is a duty we have to do in the 10 % of our time.
“I want you to see discussing these things. Think about what you are leaving for the next generation of this. i am happy to do this in the next web conferences but I really want to see ”

questions

on distributed decission making:
How do we move from hirarchical system like our goverments are to a decentralized system which is possible due to all the connections that we have. We should be able to
people naturally don’t go out an break these boundaries of locallity. (there is also a youtube paper on this) Social networking site should rather suggest to spread friends in stead of building those communities of you have 81 friends in common. go out and meet people that are far away from you. use the connections that are being made. (what is the macroscopic effect of this little change in microscoping behaviour)
I think we should do research on new democratic systems like wikipedia (or in my opinion: pirate party) there was the story that people who didn’t vote for barack obama. It turned out those who wouldn’t vote for him didn’t because they couldn’t imagine to have a black president. There was a high correlation of those people and those who have never worked together with people from different ethnical beackground. so go out and spread friendship.
Idea: liquid feedback for W3C as a working group!
question on openess an facebook and the request on tbl’s thoughts
TBL why would you build an app on facebook or a closed world if there is still the open jungle out there. Peopole already asked my that question on netscape and internet explorer. They always ask me this if monoplies rise up.

Funny/interesing Quotes by Tim Berners Lee:

“There is only one person that has been to all web conferences and that is me”
“I recieved a mail recently saying: We had to do a project on an inventor and we decided to do it on you because you are not dead!”
“Values that made the web possible: Openess, concencious about openess, transparency, privacy,”
“If you have questions it is much more interesting for me. Because I have heard myself talking before.”
“Please develop HTML5 mobile web apps rater than native mobile apps!”

]]>
https://www.rene-pickhardt.de/keynote-on-www2012-by-sir-tim-berners-lee/feed/ 4
Google Video on Search Quality Meeting: Spelling for Long Queries by Lars Hellsten https://www.rene-pickhardt.de/google-video-on-search-quality-meeting-spelling-for-long-queries-by-lars-hellsten/ https://www.rene-pickhardt.de/google-video-on-search-quality-meeting-spelling-for-long-queries-by-lars-hellsten/#respond Mon, 12 Mar 2012 19:11:04 +0000 http://www.rene-pickhardt.de/?p=1196 Amazing! Today I had a discussion with a coworker about transparency and the way companies should be more open about what they are doing! And what happens on the same day? One of my favourite webcompanies has decided to publish a short video taken from the weekly search quality meeting!
The proposed change by Lars Hellsten is that instead of only checking the first 10 words for possible spelling corrections one could predict which two words are most likely spelled wrong and add an additional window of +-5 words around them. They discuss how this change has much better scores than the old one.
The entire video is interesting because they say that semantic context is usually given by using 3 grams. My students used up to 5 grams in order to make their scentence prediction and the machine learning already told them that 4grams would be sufficient to make syntactically and semantically correct predictions.
Anyway enjoy this great video by Google and thanks to Google for sharing this:

]]>
https://www.rene-pickhardt.de/google-video-on-search-quality-meeting-spelling-for-long-queries-by-lars-hellsten/feed/ 0
Related-work.net – Product Requirement Document released! https://www.rene-pickhardt.de/related-work-net-product-requirement-document-released/ https://www.rene-pickhardt.de/related-work-net-product-requirement-document-released/#comments Mon, 12 Mar 2012 10:26:50 +0000 http://www.rene-pickhardt.de/?p=1176 Recently I visited my friend Heinrich Hartmann in Oxford. We talked about various issues how research is done in these days and how the web could theoretically help to spread information faster and more efficiently connect people interested in the same paper / topics.
The idea of http://www.related-work.net was born. A scientific platform which is open source and open data and tries to solve those problems.
But we did not want to reinvent the wheel. So we did some research on existing online solutions and also asked people from various disciplines to name their problems. Find below our product requirement document! If you like our approach you can contact us or contribute on the source code find some starting documentation!
So the plan is to fork an open source question answer system and enrich it with the features fulfilling the needs of scientists and some social aspects (hopefully using neo4j as a supporting data base technology) which will eventually help to rank related work of a paper.
Feel free to provide us with feedback and wishes and join our effort!

Beginning of our Product Requirement Document

We propose to create a new website for the scientific community which brings together people which are reading the same paper. The basic idea is to mix the functionality of a Q&A platform (like MathOverflow) with a paper database (like arXiv). We follow a strict openness principal by making available the source code and the data we collect.
We start with an analysis how the internet is currently used in different fields and explain the shortcomings. The actual product description can be found under the section “Basic idea”. At the end we present an overview over the websites which follow a similar approach.
This document – as well as the whole project – is work in progress. We are happy about any kind of comments or other contributions.

The distribution of scientific knowledge

Every scientist hast to stay up to date with the developments in his area of research. The basic sources for finding new information are:

  • Conferences
  • Research Seminars
  • Journals
  • Preprint-servers (arXiv)
  • Review Databases (MathSciNet, Zentralblatt, …)
  • Q&A Sites (MathOverflow, StackOverflow, …)
  • Blogs
  • Social Networks (Twitter, Google+)
  • Bibliograhpic Databases (Mendeley, nNode, Medline, etc. )

Every community has found its very own way of how to use this tools.

Mathematics by Heinrich Hartmann – Oxford:

To stay up to date with recent developments I check arxiv.org on a daily basis (RSS feed) participate in mathoverflow.net and search for papers over Google Scholar or MathSciNet. Occasionally interesting work is shared by people in my Google+ circles. In general the speed of pure mathematics is very slow. New research often builds upon work which has been out for a few years. To stay reasonably up to date it is enough to go to conferences every 3-5 months.
I read many papers on myself because I am the only one at the department who does research on that particular topic. We have a reading class where we read papers/lecture notes which are relevant for more people. Usually they are concerned with introductions to certain kinds of theory. We have weekly seminars where people talk about their recently published work. There are some very active blogs by famous mathematicians, but in my area blogs play virtually no role.

Computer Science by René Pickhardt – Uni Koblenz

In Computer Science topics are evolving but also changing very quickly. It is always important to have both an overview of upcoming technologies (which you get from tech blogs) as well as access to current research trends.
Since the speed in computer science is so fast and the review process in Journals often takes much time our main source of information and papers are conferences and twitter.

  • Usually conference papers are distributed digitally to participants. If one is interested in those papers google queries like “conference name year papers” are frequently used. Sites like http://www.sciweavers.org/ host and aggregate preprints of papers and organize them by conference.
  • The general method to follow a conference that one is not attending is to follow the hashtag of the conference on Twitter. In general Twitter is the most used tool to share distribute and find information not only for papers but also for the above mentioned news about upcoming technologies.

Another rich source for computer scientists is, of course, the related work of papers and google scholar. Especially useful is the method of finding a very influential paper with more than 1000 citations and find newer papers that quote this paper containing a certain keyword which is one of the features of google scholar.
The main problem in computer science is not to find a rare paper or idea but rather to filter the huge amount of publications and also bad publications and also keep track of trends. In this way a system that ranks and summarize papers (not only by abstract and citation counts) would help me a lot to select what related work of a paper I should read!

Psychology by Elisa Scheller – Uni Freiburg

As a psychologist/neuroscientist, I receive recommendations for scientific papers via google scholar alerts or science direct alerts (http://www.sciencedirect.com/); I receive alerts regarding keywords or regarding entire journal issues. When I search for a certain publication, I use pubmed.org or scholar.google.com. This can sometimes be kind of annoying, as I receive multiple alerts from different sources; but I guess it is the best way to stay up to date regarding recent developments. This is especially important in my field, as we feel a big amount of “publication pressure”; I work on a method which is considered as “quite fancy” at the moment, so I also use the alerts to make sure nobody has published “my” experiment yet.
Sometimes a facebook friend recommends a certain publication or a colleague points me to it. Most of the time, I read articles on my own, as I am the only person working on this specific topic at my institution. Additionally, we have a weekly journal club where everyone in turn presents work which is related to our focus of research, e.g. a certain part of the human brain. There is also a weekly seminar dedicated to presentations about ongoing projects.
Blogs (e.g. mindhacks.com, http://neuroskeptic.blogspot.com/) can be a source to get an overview about recent developments, but I have to admit I use them mainly for work-related entertainment.
All in all, it is easy to stay up to date using alerts from different platforms;  the annoying part of it is the flood of emails you receive and that you are quite often alerted to articles that don’t fit your interests (no matter how exact you try to specify your keywords).

Biomedical Research by Johanna Goldmann – MIT

In the biological sciences, in research at the bench – communication is one of the most fundamental tools a scientist can have. Communication with other scientist may open up the possibilities of new collaborations, can lead to a completely new view point of a known question, the integration and expansion of methods as well as allowing a scientist to have a good understanding of what is known, what is not known and what other people have – both successfully and unsuccessfully – tried to investigate.
Yet communication is something that is currently very much lacking in academic science – lacking to the extent that most scientist will agree hinders the progress of research. Nonetheless the lack of communication and the issues it brings with it is something that most scientists will have accepted as a necessary evil – not knowing how to possibly change it.
Progress is only reported in peer-reviewed journals – many which are greatly affected not only but what is currently “sexy” in research but also by politics and connections and the “publish or perish” pressure. Due to the amount of this pressure in publishing in journals and the amount of weight the list of your publications will have upon any young scientists chances of success, scientist tend also to be very reluctant in sharing any information pre-publication.
Furthermore one of the major issues is that currently there really is no way of publishing or communicating either negative results or minor findings, which causes may questions or methods to be repeatedly investigated as well as a loss of information.
Given how much social networks and the internet has changed communication as well as the access to information over the past years – there is a need for this change to affect research and communication in the life science and transform the way we think not only about solving and approaching research questions we gather but the information and insights we gain as a whole.

Philosophy by Sascha Benjamin Fink – Uni Osnabrück

The most important source of information for philosophers is http://philpapers.org/. You can follow trends going on in your field of interest. Philpapers has a list of almost all papers together with their abstracts, keywords and categories as well as a link to the publisher. Additional information about similar papers is displayed.
Every category of papers is managed by some editor. For each category it is possible to subscribe to a newsletter. In this way once per month I will be informed about current publications in journals related to my topic of interest. Every User is able to create an account and manage his literature and the papers of his he is interested in.
Other research and information exchange methods among philosophers consist of mailing lists, reading clubs and Blogs. Have a look at David Chalmers blog list. Blogs are also becoming more and more important. Unfortunately they are usually on general topics and discussing developments of the community (e.g. Leiter’s Blog, Chalmers’ Blog and Schwitzgebel’s Blog).
But all together I still think that for me a centralized service like Philpapers is my favourite tool because it aggregates most information. If I don’t hear about it on Philpapers usually it is not that important. I think among Philosophers this platform – though incomplete – seems to be the standard for the next couple of years.

Problems

As a scientist it is crucial to be informed about the current developments in the research area. Abstracting from the reports above we divide the tasks roughly into the following stages.

1. Finding and filtering new publications:

  • What is happening right now? What are the current hot topics my area? What are current trends? (→ Check arXiv/Twitter)
  • Did a friend of mine write something? Did a “big shot” write something?
    (→ Check meta information: title, authors)
  • Are my colleagues excited about a new development? (→ Talk to them.)

2. Getting more information about a given paper:

  • What is actually done in a given paper? Is it relevant for me? Is it really new? Is it a breakthrough? (→ Read abstracts. Find a good readable summary/review.)
  • Judge the quality of a paper: Is it correct? Is it well written?
    ( → Where is it published, if at all? Skim through content.)

Finally there is a fundamental decision: Shall I read the whole paper, or not? which leads us to the next task.

3. Understanding a paper: Understanding a paper in depth can be a very time consuming and tedious process. The presentation is often very short and much knowledge is assumed from the reader. The notation choices can be bad, so that even the statements are hard to understand. In effect the paper is easily readable only for a very small circle of specialist in the area. If one is not in the lucky situation to belong to that circle, one usually applies the following strategies:

  1. Lookup references. This forces you to process a whole tree of older papers which might be hard to read, and hard to get hold of. Sometimes it is worthwhile to consult a textbook to polish up fundamentals.
  2. Finding additional resources. Is there a review? Is there a related video lecture or slides explaining the material in more detail? Is the author going to a conference in the near future, or even giving a seminar in the area?
  3. Join forces. Find people thinking about the same paper: Has somebody at my department already read the paper, so that I can ask some questions? Is there enough interest to make a reading group, or more formally, run a seminar about that paper.
  4. Contact the author. This a last resort. If you have struggled with understanding the paper for a very long time and really need/want to get it, you might eventually write an email to the author – who might respond, or not. Sometimes even errors are found! – and not published! An indeed, there is no easy way to publish “errata” anywhere on the net.

In mathematics most papers are not getting read though the end. One uses strategies 1 & 2 till one gets stuck and moves on to something more exciting. The chances of survival are much better with strategy 3 where one is committed putting a lot of effort in it over weeks.

4. Finding related work. Where to go from there? Is the paper superseded by a more recent development? Which are the relevant papers which the author builds upon? What are the historic influences? What are the founding ideas of the subject? Finding related work is very time consuming. It is easy to overlook things given that the references are often vast, and sometimes hard to get hold of. Getting information over citations requires often access to commercial databases.

Basic idea:

All researchers around the world are faced with the same problems and come up with their individual solutions. There are great synergies in bringing these people together with an online platform! Most of the addressed problems are solved with a paper centric service which allows you to…

  • …get to know other readers of the paper.
  • …exchange with the other readers: ask questions, write comments, reviews.
  • …share the gained insights with the community.
  • …ask questions about the paper.
  • …discuss the paper.
  • …review the paper.

We want to do that with a new mixture of a traditional Q&A system like StackExchange or MathOverflow with a paper database and social features. The key features of this system are as follows:

Openness: We follow a strict openness principle. The software will be developed in open source. All data generated on this site will be under a creative commons license (like Wikipedia) and will be made available to the community in form of database dumps or an API (open data).

We use two different types of content sites in our system: Papers and Discussions.

Paper sites. A paper site is dedicated to a single publication. And has the following features:

  1. Paper meta information
    – show title, author, abstract, journal, tags
    – leave a comment
    – write a review (with wiki option)
    – vote up/down
  2. Paper resources
    – show pdfs, slides, notes, video lectures, etc.
    – add a resource
  3. Related Work
    – show the reference-tree and citations in an intelligent way.
  4. Discussions:
    – show related discussions
    – start a new discussion
  5. Social features
    – bookmark
    – share on G+, twitter

The point “Related Work” deserves some further explanation. The citation graph offers a great deal more information than just a list of references. Together with the user generated content like votes and the individual paper bookmarks and social graph one has a very interesting data set which can be harvested. We want this point at least view with respect to: Popularity/Topics/Read by Friends. Later on one could add more sophisticated, even graphical views on this graph.


Discussion sites.
A discussion looks more like a traditional QA-question, with the difference, that each discussion may have related (many) papers. A discussion site contains:

  1. Discussion meta information (title, author, body)
  2. Discussion content
  3. Related papers
  4. Voting
  5. Follow/Bookmark

Besides the content sides we want to provide the following features:

News Stream. This is the start page of our website. It will be generated from the network consisting of friends, papers and authors. There should be several modes like:

  • hot: heavily discussed papers/discussions
  • new papers: list new publications (filtered by tag, like arXiv feed)
  • social: What did your friends do lately
  • default: intelligent mix of recent activity that is relevant to the logged in user


Moreover, filter by tag should be always available.

Search bar:

  • Searches contents of the site, but should also find papers on freely available databases (e.g. arXiv). Adding a paper should be very seamless process from there.
  • Search result ranking uses vote and view information.
  • Personalized search information. (Physicists usually do not want sociology results.)
  • Auto completion on paper titles, author, discussions.

Social: (hard to implement, maybe for second version!)

  • Easily refer to users by @-syntax familiar from Twitter/Google+
  • Maintain a friendship / trust graph
  • Friendship recommendations
  • Find friends from Google+ on the site

Benefits

Our proposed websites improves the above mentioned problems in the following ways.
1. Finding and filtering new publications:This step can be improved with even very little  community effort:

  • Tell other people, that you are interested in the paper. Vote it up or leave a comment if you are very excited about it.
  • Point out a paper to a colleague.

2. Getting more information about a given paper:

  • Write a summary or review about a paper you have read or skimmed through. Maybe the introduction is hard to read or some results are not clearly stated.
  • Can you recommend reading this paper? Vote it up!
  • Ask a colleague for his opinion on the paper. Maybe he can write a summary?

Many reviews of new papers are already written. E.g. MathSciNet and Zentralblatt maintain a large database of Reviews which are provided by the community and are not freely available. Many authors would be much more happy to write them to an open system!
3. Understanding a paper:Here are the mayor synergies which we want to address with our project.

  • Ask a question: Why is the author using this experimental method? How does Lemma 3.4 work? Why do I need this assumption? What is the intiution behind the “virtual truncation”? What implications does this work have?
  • Start a discussion: (might involve more than one paper.) What is the difference of these two papers? Is there a reference explaining this more clearly? What should I read in advance to understand the theory?
  • Add resources. Tell the community about related videos, notes, books etc. which are available on other sites.
  • Share your notes. If you have discussed a paper in a reading class or seminar. Collect your notes or opinions and make them available for the community.
  • Restate interesting statements. Tell the community when you have found a helpful result which is buried inside the paper. In that way Google may find it!

4. Finding related work. Having a well structured and easily navigable view on related papers simplifies the search a lot. The filtering benefits from the content generated by the users (votes) and individual information, like friends who have written/bookmarked a paper.

Similar Sites on the Web

There are several discussions in QA forum which are discussing precisely this problem:

We found three sites on the internet which follow a similar approach which we examined more carefully.
1. There is a social network which has most of our features implemented:

researchgate.net
“Connect with researchers, make your work visible, and stay current.”

The Economist has dedicated an article to them. It is essentially a facebook clone, with special features for scientist.

  • Large, fast growing community. 1.4m +50.000/m. Mainly Biology and Medicine.
    (As Daniel Mietchen points out, the size might be misleading due to institutional accounts)
  • Very professional Look and Feel. Company from Berlin, Germany, funded by VC. (48 People involved, 10 Jobs advertised)
  • Huge Feature set:
    • Profile site, Connect to friends
    • News Feed
    • Publication Database, Conference Finder, Jobmarket
    • Every Paper its own page: with
      • Voting up/down
      • Comments
      • Metadata (Title, Author, Abstract, Preveiw)
      • Social Media (Share, Bookmark, Follow author)
    • Organize Workgroups/Reading Classes.

Differences to our approach:

  • Closed Data / Closed Source
  • Very complex site which solves a lot of purposes
  • Only very basic features on paper site: vote/comment.
  • QA system is not linked well to paper database
  • No MathML
  • Mainly populated by undergraduates

2. Another website which comes reasonably close is:

http://www.sciweavers.org/

“an academic network that aggregates links to research paper preprints
then categorizes them into proceedings.”

  • Includes a large collection of online tools for various purposes
  • Have a big library of papers/software/datasets/conferences for computer science.
    Paper sites have:
    • Meta information and preview
    • Vote functionality and view statistics, tags
    • Comments
    • Related work
    • Bookmarking
    • Author information
  • User profiles (no friendships)


Differences to our approach:

  • Focus on computer science community
  • Comment and Discussions are well hidden on paper sites
  • No News stream
  • Very spacious design

 
3. Another very similar site is:

journalfire.com – beta
“Share what your read – connect to colleagues – create journal clubs.”

It has the following features:

  • Comment on Papers. Activity feed (?). Follow articles.
  • Host Journal Clubs. Create Events related to papers.
  • Powerful search box fetching papers from Arxiv and Pubmed (slow)
  • Social features on site: User profiles, friend finder (no fb/g+ integration yet)
  • News feed – from subscribed papers and friends
  • Easy paper import via Bookmarklet
  • Good usability!! (but slow loading times)
  • Private reading clubs cost money!

They are very skilled: Maintained by 3 PhD students/postdocs from Caltec and MIT.

Differences to our approach:

  • Closed Data, Closed Source
  • Also this site misses (currently) misses out ranking features
  • Very Closed model – Signup required
  • Weak Crowd sourcing: Cannot add Meta information

The site is still at its very beginning with little users. The project started in 2010 and did not gain much momentum since.

The other sites are roughly classified in the following categories:
1. Single people who are following a very similar idea:

  • annotatr.appspot.com. Combines a metadata-base with the disqus plugin. You can comment but not rate. Good usability. Nice CSS. Good search function. No MathML. No related article suggestion. Maintained by two academics in private time. Hosted on Google Apps. Closed Source – Closed Data.
  • r-Forum – a resource where mathematicians can collect record reviews, corrections of a resource (e.g. paper, talk, …). A simple Vanilla-Forum/Wiki with almost no content used by maybe 12 people in US. No automated Data import. No rating system.
  • http://math-arch.org/ – Post comments to math papers. very bad usability – get even errors. Maintained by a group of russian programmers LogicSun. Closed Source – Closed Data.

Analysis: Although the principal idea to connect people reading papers is there. The implementation is very bad in terms of usability and even basic programming. Also the voting features are missed out.

2. (Semi) Professional sites.

  • Public Libary of Science very professional, huge paper data base for mainly biology, medicine. Features full text papers, lots of interesting meta information including references. Has comment features (not very visible) and news stream on the start page.
    No QA features (+1, Ask question) on the site. Only published articles are on the site.
  • Mendeley.com – Huge Bibliographic database with bookmarking and social features. You can organize reading groups in there, with comments and notes shared among the participants. Features a news stream with papers by friends. Nice import. Impressive fulltext data and Reference features.
    No QA features for paper. No comments for paper. Requires Signup to do anything useful.
  • papercritic.com – Open review database. Connected to Mendely bibliographic libary. You can post reviews. No rating. No comments. Not open: Mendely is commercial.
  • webofknowledge.com. Commercial academic citation index.
  • zotero.org – features programm that runs inside a browser. “easy-to-use tool to help you collect, organize, cite, and share your research sources”

Analysis: The goal of all these tools is to simplify the reference management, by providing metadata like references, citations, abstracts, author profiles. Commenting features on the paper site are not there or not promoted.
3. Vaguely related sites which solve different problems:

  • citeulike.org – Social bookmarking for papers. Closed Source – Open Data.
  • http://www.scholarpedia.org. A peer reviewed open access encyclopedia.
  • Philica.com Online Journal which publishes articles from any field along with its reviews.
  • MathSciNet/Zentralblatt – Review database for math community. Closed Source – Commercial.
  • http://f1000research.com/ – Online Journal with a public, post publish review process. “Open Science – Open Data – Open Review”
  • http://altmetrics.org/manifesto/ as an emerging trend from the web-science trust community. Their goal is to revolutionize the review process and create better filters for scientific publications making use of link structures and public discussions. (Might be interesting for us).
  • http://meta.wikimedia.org/wiki/WikiScholar – one of several ideas under discussion at Wikimedia as to a central repository for references (that are cited on Wikipedias and other Wikimedia projects)

Upshot of all this:

There is not a single site featuring good Q&A features for papers.

If you like our approach you can contact us or contribute on the source code find some starting documentation!
So the plan is to fork an open source question answer system and enrich it with the features fulfilling the needs of scientists and some social aspects which will eventually help to rank related work of a paper.
Feel free to provide us with feedback and wishes and join our effort!

]]>
https://www.rene-pickhardt.de/related-work-net-product-requirement-document-released/feed/ 17