Linked open data or the web of data is the main concept and idea of the semantic web proposed by Tim Berners-Lee. The semantic web is often also referred to as web3.0. But what is the difference of the Internet as we know it by now and the web of data?
On the Internet there are hypertext documents which are interlinked to each other. As we all know search engines like Google use the hyperlinks to calculate which websites are most relevant. A hyper text document is created for humans to read and understand. Even though Google can search those documents in a pretty efficient way and does amazing things with it Google is not able to read and interpret these documents or understand the semantics. Even though the search result quality is already very high it could be increased by a lot if search engines where only able to understand the semantics of the documents people put on the web.
The idea of the Linked open data is that data should also be published in a way that machines can easily read and “understand” it. Once the data is published web documents could even be annotated with this data invisible to humans but helping computers to understand the semantics and thereby result in better distribution of information. Also a lot of new webservices would become possibible increasing the quality of our daily live just as Google and Wikipedia have done.
An Example of linked data:
You might wonder now how data can be linked? This is pretty easy. Let us take me for example. Take the following statement about me: “Rene lives in Koblenz which is a City in Germany.”
So I could create the following data triples:
- (Rene Pickhardt, lives in, Koblenz)
- (Koblenz, type, city)
- (Koblenz, is in, Germany)
- (Germany, type, country)
As you can see from the picture these triples form a graph. Just like the Internet. Now comes the cool part. The Graph is easily processable by a computer and we can do some reasoning: A computer by pure reasoning can conclude that I also live in Germany and that cities are in countries.
Now assume everyone would publish his little data graphs using an open standard and there was a way to connect those graphs. Pretty quickly we would have a lot of knowledge about René Pickhardt, Koblenz, Cities, Germany, countries,… especially if we automatically process a web document and detect some data we can use this background knowledge to fight ambiguities in language or just to better interconnect parts of the text by using the semantics from the Web of Data.
Is linked open data a good or a bad thing?
I think it is clear what great things can be achieved once we have this linked open data. But there are still a lot of challenges to take.
- How to fight inconsistencies and Spam?
- How to trust a certain source of data?
- How can we easily connect data from different source and identify same nodes?
- How to fight ambiguities?
Fortunately we already have more or less satisfying answers to these questions. But as with any science we have to carefully watch out. Since linked open data is accessible by everyone and enables probably as many bad things as it enables people to do good things with it. So of course we should all become very euphoric about this great thing but bear in mind that nuclear science was not only good thing. By the end of the day it lead to really bad things like nuclear bombs!
I am happy to get your feedback and opinion about linked open data! I will very soon publish some articles with links to sources of linked open data. If you know some why don’t you tell me in the comments?