LinkedScotland: datasets from the hackday
We had an interesting and productive session at the Scottish Linked Data hackday in Glasgow last week. Around 15 people showed up to work on and chat about all things linked.
I worked together with Keith Alexander of Talis and Peter Winstanley and Euan Smith of the Scottish Government, to create a Linked Data version of the Scottish Neighbourhood Statistics datasets, a wide ranging collection of statistical data from across the Scottish Government. That’s a big task and we didn’t finish it! But we did make some significant progress, created some useful concrete results and established the feasibility of doing it on a larger scale.
My task was to put together linked data for the various geographic regions used to organise the statistics. The main ones are Census Output Areas, Data Zones, Intermediate Geographies, Local Authority Districts and Health Board areas. The districts are already described in linked data from data.gov.uk, so we used that. But for the other types of region, no suitable URIs exist (as far as we could tell) so I created some new ones.
Prior to the event, my colleague Ric had set up a simple Linked Scotland website using our PublishMyData platform, where we could host any linked data datasets created during the day. (We also plan that an enhanced version of this site can become a long-term home to gather linked data about Scotland). The new URIs I minted for the geographical zones are in the linkedscotland.org namespace.
My geography dataset can be found at http://linkedscotland.org/id/dataset/geography/sns, with each region available as a dereferenceable URI and all the info available via SPARQL at http://sparql.linkedscotland.org.
Alongside that, Peter was working on parsing date information and matching that up with definitions of calendar years, financial years, government years, and all kinds of other time periods at reference.data.gov.uk. Keith was working on processing a few example datasets into RDF, using the excellent Data Cube ontology as the basis of the data model. The key challenge in this was to programmatically transform the SNS indicator metadata files into data cube format, with a view to enabling all 2000 or so indicators to be processed in future. Check out Keith’s school pupil numbers dataset here.
Paola di Maio took notes at the ‘show and tell’ session at the end of the day and put those up (currently in a raw form) on the SLD website.
It was useful day and we plan to use a similar format for the next meetup of the Scottish LD group in a few months. Thanks very much to Keith for putting forward the idea of a hackday and for organising the event (and thanks to Talis for paying for lunch.)
Follow @Swirrl on Twitter
