Provenance in the Wild: Provenance at The Gazette

Today, following my post on Provenance in the 2014 National Climate Assessment, I continue to blog about applications making use of PROV. The Gazette is  the UK’s official public record since 1665, The Gazette has a long and established history and has been at the heart of British public life for almost 350 years (see an overview of its history).  Today’s Gazette continues to publish business-critical information – and, thanks to its digital transformation, this information is now more accessible than ever before.

A quick reminder of what I mean by provenance:

Provenance is a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing.

The W3C PROV recommendations offer a conceptual model for provenance, its mapping to various technologies such as RDF, XML, or simple textual form, but also the means to expose, find and share provenance over the Web.

In a true open government approach, the tender for the Gazette Web site is a available online, and requested the use of PROV (which was then a Candidate Recommendation).  The purpose of provenance on The Gazette is to describe the capture, transformation and enrichment and publishing process applied to all Notices published by the Gazette.  Let us examine how this was actually deployed.

For instance, the notice available at https://www.thegazette.co.uk/notice/2152652 records a change in a partnership. In the right hand column, we see a link to Provenance Trail.

notice2152652-annotated

Following this link, we obtain the provenance information for this notice:

notice2152652-provenance

On this page, we find a graphical representation of the publication pipeline for this notice, and various links to machine-processable representation of provenance (in RDF/XML and JSON).  When uploading this provenance into the Southampton PROV Translator service, we obtain the following graphical representation, which shows a much more complex and detailed pipeline.

 

notice2152652-provenance-tool

Provenance Information is not just exposed in browsable format, but is also exposed in machine processable format. Going back to the original https://www.thegazette.co.uk/notice/2152652 page, and looking at the html source, we can find the following link element, stating the existence of a relation between the current document and the provenance page. The relation http://www.w3.org/ns/prov#has_provenance is defined in the W3C Provenance Working Group Provenance Access and Query specification.

<link rel="http://www.w3.org/ns/prov#has_provenance"
      href="https://www.thegazette.co.uk/id/notice/2152652/provenance"
      title="PROVENANCE" />

Tools like Danius Michaelides’ ProvExtract can pickup this link and feed it into the Southampton Provenance Tool suite. ProvExtract also extracts some metadata, expressed as RDFa embedded in the document.

 

provextract

Unfortunately, a slight inter-operability issue showed up here. The resource https://www.thegazette.co.uk/id/notice/2152652/provenance has only an html representation. In our ProvBook, Paul and I explain how content-negotiation can be used to serve multiple representations, including the standardized ones such as Turtle, PROV-X, and PROV-N. It is an improvement that may be considered by The Gazette in future releases. Overall, I think it is remarkable how The Gazette exposes provenance in both visual and machine-processable formats.  Congratulations to The Gazette’s team for this achievement.

I will finish this post with a few concluding remarks.

  1. While the National Climate Assessment 2014 report  exposes provenance to end-users as text, The Gazette opted for a high-level visualization of the pipeline. It is interesting to observe how simplified The Gazette graphical representation is, compared to the graphical rendering of the raw data displayed in this post. It shows that abstraction of provenance is an important processing step to apply to provenance to make it understandable. It was a recurrent topic of discussion at  Provenance Week 2014.
  2. The Gazette also provides signed version of its provenance (and other metadata). It is a powerful way of asserting the authorship of such provenance: in other words, it is a cryptographic form of provenance of provenance, which is non-repudiable (The Gazette cannot deny publishing such information) and unforgeable (nobody else can claim to have published this information). Her,e practitioners are ahead of standardisation and theory: there is no standard way of signing provenance and there is no formal definition of a suitable normal form of provenance ready for signature.
  3. As we supply such prov data to the tools we have developed in Southampton, we can obtain interesting visualization. It really shows the benefits of standardisation, since PROV data produced by The Gazette team can be consumed by independently developed applications.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s