ProvToolbox Tutorial 4: Templates for Provenance (part 1)

1. Introduction

In several of our applications, we felt the need of separating the logging of information from the constructing and storing of provenance. For this, we introduced PROV-Template a templating system for provenance, describing the shape of provenance graphs to be generated, and we specified an algorithm capable of instantiating templates, with specific values.

The purpose of this tutorial is to introduce PROV-Template and how templates can be instantiated using ProvToolbox. This functionality is directly available from the command line using provconvert.

The tutorial is standalone and a zip archive can be downloaded from the following URL: http://search.maven.org/remotecontent?filepath=org/openprovenance/prov/ProvToolbox-Tutorial4/0.7.0/ProvToolbox-Tutorial4-0.7.0-src.zip. The tutorial can also be found on the ProvToolbox project on GitHub.

The tutorial assumes that provconvert has been installed and is available in the execution path. (See http://lucmoreau.github.io/ProvToolbox/ for installation instructions.) The tutorial relies on a Makefile and can simply be run by calling:

make do.all

2. Example of Templates

2.1 A Template for Attribution of a Quote

Building on blog post “A little provenance goes a long way”, imagine that we need to systematically provide attribution to quotes. As this is a repetitive tasks, we should consider the PROV-Templates approach to generate provenance.

A provenance template is itself a PROV document in which some variables act as placeholders for values to be filled at expansion time. More precisely, a template is a bundle of PROV assertions: a bundle is the PROV mechanism by which provenance of provenance can be expressed.

The figure below contains a graphical illustration of a template for Quote Attribution. It contains the following variables:

  • var:author the identifier of the author (stated to be a prov:Person)
  • var:name the author’s name
  • var:quote the identifier of the quote
  • var:value the quote itself
  • vargen:bundleId the identifier of the bundle to be generated

The quote is attributed to the author agent. The variables var:author, var:namer, var:quote, var:value are qualified names in a namespace reserved for PROV-Template variables, and are conventionally prefixed with the prefix var. There is an expectation that values need to be provided for these variables when instantiating a template. On the other hand, the variable vargen:bundleId, with prefix vargen, can have a value generated automatically at instantiation time.

Quote Attribution Template

Quote Attribution Template

Concretely, in the PROV-N notation, the template is expressed as follows.

document

  prefix var <http://openprovenance.org/var#>
  prefix vargen <http://openprovenance.org/vargen#>
  prefix tmpl <http://openprovenance.org/tmpl#>
  prefix foaf <http://xmlns.com/foaf/0.1/>
  
  bundle vargen:bundleId
    entity(var:quote, [prov:value='var:value'])
    entity(var:author, [prov:type='prov:Person', foaf:name='var:name'])
    wasAttributedTo(var:quote,var:author)
  endBundle

endDocument

2.2 Template Instantiation: A Little Provenance Goes a Long Way

Let’s now look into how we can instantiate the templates. Let us consider the following bindings for the 4 variables author, name, quote and value. An association between a variable and a value is referred to as a binding.

var:author http://orcid.org/0000-0002-3494-120X
var:name “Luc Moreau”
var:quote ex:quote1
var:value “A Little Provenance Goes a Long Way”

If we instantiate the template with these bindings, we obtain the following instantiated document. We note that vargen:bundleId was instantiated with UUID value.

Template Instantiation for "A Little Provenance Goes a Long Way"

Template Instantiation for “A Little Provenance Goes a Long Way”

Expansion of a template with provconvert is straightforward. The parameter -infile must be used to provide the template. The binding file is specified with the -binding parameter. The resulting instantiated template is specified with -outfile.

	
provconvert -infile template1.provn -bindings binding1.ttl -outfile doc1.provn

The input template and its instantiation can be expressed in any of the formats supported by ProvToolbox. We still have to express the set of bindings. We did not want to introduce a new specific format (though we may do it in the future), so, we just decided to use PROV. In particular, the Turtle notation is fairly elegant in this case. Two family of properties are introduced in the tmpl namespace, namely value_i and 2dvalue_i_j, for binding variables in identifier and value positions, respectively.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau".
var:quote  a prov:Entity;
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity;
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

Details about the syntax of bindings can be found in https://provenance.ecs.soton.ac.uk/prov-template/.

2.3 Template Instantiation: A Second Author

In some cases, we would like to express that there is a second author to a document. The attribution template does not need to be redefined. We simply need to provide relevant bindings for the second author.

For instance, Paul and Luc are the two authors of that quote. Conceptually, we want to provide the following bindings.

var:author http://orcid.org/0000-0002-3494-120X
http://orcid.org/0000-0003-0183-6910
var:name “Luc Moreau”
“Paul Groth”

We see that each of var:author and var:name is given two values. This results in the following expanded provenance graph.

Instantiation with Two Authors

Template Instantiation with Two Authors

The contents of the bindings file is explicit below. Lines 7-9, var:author is given two values, using the properties tmpl:value_0 and tmpl:value_1. Lines 10-12, var:name is given two values to occur in attribute position, with properties tmpl:2dvalue_0_0 and tmpl:2dvalue_1_0.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>;
           tmpl:value_1 <http://orcid.org/0000-0003-0183-6910>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau";
           tmpl:2dvalue_1_0 "Paul Groth".
var:quote  a prov:Entity; 
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity; 
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

Again, we refer the reader to the PROV-Template specification for details of the bindings syntax.

2.4 Template Instantiation: More Attributes

In general, PROV also allows for variable number of attribute values to be provided for a given attribute. For instance, we may want the name and nick name to be provided as two possible values for the var:name variable. This would result in the following expanded graph.

Template Instantiation: Variable Number of Attributes

Template Instantiation with Variable Number of Attributes

Again, the template remains unchanged, but the bindings are as follows. In lines 12-13, we see two possible names for Paul, respectively expressed with tmpl:2dvalue_1_0 and tmpl:2dvalue_1_1. This shows that template expansion can support a variable number of attributes for different statements instantiated from the same template statement.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>;
           tmpl:value_1 <http://orcid.org/0000-0003-0183-6910>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau";
           tmpl:2dvalue_1_0 "Paul Groth";
           tmpl:2dvalue_1_1 "pgroth".
var:quote  a prov:Entity;
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity;
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

3. Conclusions

PROV-Template is easy to work with, it just requires provconvert to be installed. By decoupling the generation of provenance from the logging of values, we observed a number of benefits:

  • It allowed us to fine tune the provenance, independently of the application.
  • It permitted us to keep the code to generate the provenance separate from the application itself.
  • It allowed us to adopt a more conceptual approach to provenance, thinking of “provenance schemas” rather than instances.

This is the first part of the tutorial on PROV-Template. In the second part of the tutorial, we will see how PROV-Template can support more sophisticated use cases.

Thanks to co-authors Dong and Danius. Heather has been using it in Smart Society’s SmartShare application.

Advertisements

2 thoughts on “ProvToolbox Tutorial 4: Templates for Provenance (part 1)

  1. Pingback: What is in ProvToolbox 0.7.2? | Luc's Blog

  2. Pingback: What is in ProvToolbox 0.7.3? | Luc's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s