Manchester buses Linked Data diary, Days 4-6

So much for the plan of blogging about this every day…it’s all been a bit busy the last few days.

But, I’m happy to say we have finished making our Manchester buses Linked Data and got it all loaded up in time for the LovelyData hackday.

Revising the ontology

Following a useful suggestion from Ian Davis, I revised the data structure and the names and meaning of the properties and classes to align more closely with Ian’s Google GTFS compatible approach.

The main change was getting rid of the ‘Service’ resources, which pulled together multiple trips on a route, all sharing the same calendar information, and moving the scheduling info to a ‘ServiceCalendar’ resource.

Processing the files

I adapted the ‘cif2nt’ ruby script with the new ontology and ran it on all 524 CIF files (using a simple shell script to loop through them all). That took less than an hour and produced around 30 million triples. It came to around 5.7GB of n-triples in total, but that compressed down to 200MB.

It then took another hour and a bit to load them all into our triple store. The final step was adding some metadata for the dataset (mainly voiD, plus a few items we use for configuring the website presentation of the data) and the data is now live on the Linked Manchester site.

On the Linked Manchester site, you can browse the data and run SPARQL queries against it, with all views available in multiple formats (HTML, JSON, XML etc). A full data dump is also available for download (196MB compressed tarfile).

Lessons learned at the hackday

We arrived at the Lovely Data hackday on Saturday (I’m the shiny headed chap in the middle of the picture, with Ric in the blue sweatshirt to my right). Though it was tough to ignore the lovely sunny weather outside, we’re glad we did and had a very interesting day.

As well as lots of interesting discussions around open and linked data, we were pleased to work with Ben Gibbs who spent the day creating a very nice timetable app. Ben’s app gets its data from SPARQL queries to our Linked Manchester buses dataset.

Just type a bus number into the box (111 for example) and you get a list of all the stops on the route. Click on a stop and you see all the times that the 111 calls at that bus stop, with any buses that you have already missed greyed out.

Choosing the right set of schedule data from the dataset required some moderately tricky SPARQL queries, but with a small amount of trial and error we got it all working.

There’s nothing like someone using your data to highlight what’s wrong with it and we turned up a few minor mistakes that we will fix for the next release. We’d forgotten to add labels for some of the resources, which messed up the display a little and we realised we were missing some of the required StopPoints from NaPTAN, because we had only loaded the BusStopPoints and not the CoachStationPoints.

Another interesting point that came up is that the GMPTE data identifies which buses have a low floor, hence accessible for wheelchairs and prams. But it only appears as a comment in a description field which makes it a little difficult to query for. In the next version, we’ll extract that information into an ‘accessibility’ property to make it easier to find.

What’s next

The bus schedule data is updated once a week and we plan to automate our conversion process, so that we can update the Linked Data version of the data weekly as well. We hope to get that up and running within the next few weeks.

Also, we’ll add our dataset as a new package on the DataGM website.

With Ben’s timetable app as proof, we think we’ve succeeded in making the data accessible in more flexible and more powerful ways than currently offered by the Transport for Greater Manchester website and we hope that other developers can also find useful stuff to do with it.

The ATCO-CIF format is a standard across UK transit companies, so the work we’ve done on the Manchester data could easily be adapted to work with any other open bus schedule data that might be available.

blog comments powered by Disqus