Tables of data are a common feature of reports and blogs, so they represent an important use case for getting RDF online. The simplest and most commonly used approach is to assign a URI to each row in the table. That URI forms the subject of a set of triples, with the column name as predicate and the cell contents as the value. This works fine in some simple cases, but many tables are not so simple.

My main concern was to investigate the options for the RDF design of the data and I could have published a standalone RDF file, but I like the self-contained nature of the RDFa approach, where the RDF and HTML representations are both in the same web page. So my experiment was to publish a table of data as HTML and RDFa.

I’m a fan of the always interesting Guardian datablog so that was a natural place to look for some sample data. When I started looking at this a couple of weeks ago, the article that day was by Simon Rogers on Carbon Dioxide emissions and the accompanying data had some interesting features.

The dataset consists of carbon dioxide emissions for each country in the world, year by year from 1980 to 2006. Each data point is a physical quantity, with an associated unit. This pattern of time-varying data is extremely common, but not entirely straightforward to handle in RDF – so I think it makes an interesting case.

I selected a small part of the data (the first six countries and the first three years), to keep things simple and to respect the licence for the data. (Hopefully this little snippet counts as “fair use”!)

Time varying data

So how can we best represent this data in RDF?

Taking the simple approach of one subject per row, one predicate per column doesn’t really work with this kind of data structure. DBPedia typically represents this type of information with a pair of properties. For example the GDP of France is specified by the properties “gdpNominal” to tell you it is a measure of GDP and “gdpNominalYear” to tell you the year. However, if you have data for more than one year, then this approach no longer works.

Another approach could be to include the year information in the property name, such as “CO2emissions1980”, “CO2emissions1981” etc, but that leads to a large number of properties that are not very reusable. And in this case we want to specify a unit too.

So some kind of N-ary relation is required. Ian Davis recently published a series of articles reviewing and comparing the different options for representing time in RDF. One of those options is using N-ary relations and that is the approach I decided to take.

I defined a class (in my own namespace) called SpaceAndTimeDependentObservation. (At some point I’ll do the extra work required to create a small ontology around this class – I haven’t done that yet). Each cell in the table becomes an instance of that class, with a location (the country) and time (the year) associated with it, as well as a property, CO2emissions, whose value is a quantity with an amount and a unit. So each table cell becomes a graph fragment that looks like this:

CO2 emissions RDF node

(I’ve used an ellipse to represent a resource and a rectangle to represent a literal value. I’ve left out the namespace prefixes to keep it simple. The ellipses with no text in them are blank nodes.)

Countries

To follow good Linked Data practice, it makes sense to use existing URIs for countries, allowing the data here to be connected to other information about those countries. The two obvious choices here are DBpedia and Geonames. The DBpedia URIs are more ‘readable’ but the Geonames URIs are more closely linked to all kinds of other useful geographical data through the Geonames database, so I decided to use Geonames.

Quantities and units

For the CO2 emissions data themselves, it was important to specify a unit alongside the numbers. I used a ‘quantity’ blank node with an rdf:value and a unit. For now I just named the unit in my own namespace, but a better approach would be to use an established units ontology such as QUDT or SWEET.

Times

There are a few ontologies to choose from for describing time and time intervals. In this case I decided to mint my own property, (imaginatively called ‘time’) because I wanted its meaning to be tied to my SpaceAndTimeDependentObservation class. I annotated the year values (1980 etc) with the xsd:gYear datatype.

Putting it all together in RDFa

The first thing to do was to change the DOCTYPE for this blog to ”-//W3C//DTD XHTML+RDFa 1.0//EN”, so that the RDFa markup will be interpreted correctly. The data itself is a straightforward HTML table. Within each <td> element is a selection of divs to hold all the RDFa markup, with the only “visible” content being the data from the original Guardian spreadsheet.

A sample cell of the table looks like this:

<td>
    <div typeof="w:SpaceAndTimeDependentObservation">
     <div rel="w:CO2emissions">
        <div property="rdf:value" datatype="xsd:float">0.53</div> 
        <div property="w:unit" content="million metric tonnes"></div>
     </div>
     <div rel="w:location" resource="http://sws.geonames.org/3573345/"></div>
     <div property="w:time" content="1980" datatype="xsd:gYear"></div>     
    </div>
</td>

Do a “view source” on this page to see the full story. One thing to note is that this approach is quite verbose, adding about 300 characters of markup to each cell of actual data. I ended up with 6 triples per cell in order to represent reasonably precisely the data from the original table.

One tool I found very handy while writing all this stuff was Mark Birbeck’s Ubiquity RDFa parser bookmarklet. I recommend you give it a try.

The final result

So here it is: my RDFa marked-up HTML table representing the original data. Useful further work would be to add some additional metadata on the authorship and provenance of the dataset, but I’ll save that for another day.

One thing that stands out through this whole process is that there are many design choices to be made when deciding how to represent data as RDF. I’d be keen to hear what you think of the approach I’ve described here and whether you would do it differently.

World Carbon Dioxide Emissions from the Consumption and Flaring of Fossil Fuels, 1980-2006 (million metric tonnes of Carbon Dioxide)
2006 World Ranking Country 1980 1981 1982
176
Bermuda
0.53
0.46
0.49
7
Canada
458.35
443.01
424.26
179
Greenland
0.01
0.01
0.00
13
Mexico
240.43
266.61
282.25
207
Saint Pierre and Miquelon
0.15
0.15
0.13
2
United States
4788.65
4666.19
4421.14
North America
5488.11
5376.43
5128.28
Keep up to date with our news by signing up to our newsletter.
Thanks for reading all the way to the end!
We'd love it if you shared this article.