Delivering Linked Data quickly
I’ve been involved in a couple of discussions in the last few days about speed of access to Linked Data and whether that will limit what can be done with it.
This was kicked off by some questions on Twitter by Greg Boutin. Greg was particularly interested in queries on distributed data, particularly data that changes frequently so that caching it locally may be difficult. He’s worried that this will be so slow that it is not practical.
Michael Hausenblas posted his investigations into use of HTTP caching by Linked Data publishers and found that only a small proportion use HTTP headers like ‘Last-modified’ and ‘ETag’. (For an explanation of how that stuff works, read this: “Things caches do”).
Another angle on caching is to keep your own copy of external datasets, but in that case you also need to update your local copy if the external data changes. There’s been an interesting thread on this on the LOD mailing list, started by George Kobilarov.
My feeling is that the approach of Linked Data to piggy-back on the proven-to-be-scalable architecture of the web is definitely the right way to go, though there is clearly work to do to make the best use of the tools available to us. Some kinds of applications are more suited to LD than others. But the web already offers a sound caching mechanism – as a first step we should be using that to better advantage.
We'd love it if you shared this article.