tl,dr verion: Source code at github! A couple of days ago a data set was released on Wikileaks consisting of about 23 thousand emails sent within the Democratic National Committee that would demonstrate how the DNC was actively trying to prevent Bernie Sanders from being the democratic candidate for the General public election. I am …
Category: Data sets
Analyzing the final and intermediate results of the iversity MOOC Fellowship online voting
As writen before Steffen and I participated in the online voting for the MOOC fellowship. Today the competition finished and I would like to say thank you to everyone who so far participated in the voting in particular to the 435 people supporting our course. I did never image to get that many people to …
Download Google n gram data set and neo4j source code for storing it
In the end of September I discovered an amazing data set which is provided by Google! It is called the Google n gram data set. Even thogh the english wikipedia article about ngrams needs some clen up it explains nicely what an ngram is. http://en.wikipedia.org/wiki/N-gram The data set is available in several languages and I …
Download network graph data sets from Konect – the koblenz network colection
UPDATE: now with link to the PhD thesis. By the time of blogging the thesis was not published. thanks to Patrick Durusau for pointing out the missing link. One of the first things I did @ my Institute when starting my PhD program was reading the PhD thesis of Jérôme Kunegis. For a mathematician a …
Download Trec (= Text Retrieval Conference) Data Set
Being back in University I get to see more and more data sets. Origninally I wanted to use the data sets category of my blog to provide an unordered list of these publicly available data sets sort of as a personal reminder. For some reason I never really did that but I am now about …
How to download Wikipedia
Wikipedia is an amazing data set to do all different kinds of research which will go far beyond text mining. The best thing about Wikipedia is that it is licensed under creative common license. So you are allowed to download Wikipedia and use it in any way you want. The articles have almost no spelling …
Download Data Sets on this blog
Why is there a data set section on this blog and what kind of articles can be found in this section. What data sets will be available for download on this blog?