The Technologies For Open Data

It could be a database of cases being dealt with, it could be a calendar of meetings, it might be a collection of PDF documents of the minutes of those meetings, or perhaps it’s even a filing cabinet containing manilla folders full of paper.

Let’s assume that we can get the data in a digital form, there would still be a wide range of different types of data. We can place them on a Web server so that people can download them, but it might be useful to try and categorise them in a way that helps people understand what type of data it is and how easy it will be for them to make use of the data once they’ve downloaded it.

Tim Berners-Lee came up with a simple five star rating system that helps describe the nature of published open data. The rating system can be summarised as follows:

One star data:

The data is in a proprietary format that might be easily readable by a person, but is perhaps harder to process by a computer. This might be a PDF document for example. A PDF of a document describing the expenditure of a local council would allow people to read what has been spent, but perhaps not allow them to easily write a computer script to check if any expenditure was over a certain  data hk  amount.

Two star data:

Here, the data is a more machine readable form but still a proprietary format. An example here might be an MS Office Excel spreadsheet. It is easy to read, and a script could be written to examine it automatically, but the format is perhaps specific to a certain type of computer operating system or application, that may not be free to use.

Three star data:

Now, the data is in a non-proprietary format such as CSV (standing for comma separated variables.) This means that it can be opened by a range of applications and across a number of different computer platforms and operating systems. It is also relatively easy to process automatically using scripts, but the script will need to understand the format of the file, for example what each of the columns means.

Four star data:

Data in this form uses specific Web technologies that allow us to describe the semantics of the data. For this MOOC, we don’t have scope to discuss Semantic Web technologies in great detail although we’d encourage you to explore the area if you find it interesting, but in simple terms the data is written in a Web format such as RDF (Resource Description Framework) that can be used to describe the data in a way that allows machines to understand the semantics of the data more easily.