Principles for Algorithmic Transparency and Accountability: A Provenance Perspective

A few days ago, the ACM U.S. Public Policy Council (USACM) released a statement and a list of seven principles aimed at addressing potential harmful bias of algorithmic solutions. This effort was initiated by the USACM’s Algorithmic Accountability Working Group.  Algorithmic solutions are now widely deployed to make decisions that affect our lives, e.g., recommendations for movies, targeted ads on the web, autonomous vehicles, suggested contacts or reading in social networks, etc.  We have all come across systems making decisions that are targeted to us individually, and I am sure that many of us have wondered  how a given recommendation was made to us, on the basis of which information and what kind of profile. Typically, no explanation is made available to us!  Nor there is any means to track the origin of such decisions!

Interestingly, emerging regulatory frameworks, such as the EU General Data Protection Regulation, are introducing the “right to explanations” (see https://arxiv.org/abs/1606.08813) in particular related to Article 22 on Automated individual decision-making, including profiling. So, the regulatory framework is evolving, even though there is still no consensus on how to actually achieve this in practice.

Furthermore, algorithmic bias is a phenomenon that has been observed in various contexts (see for instance two recent articles of the  New-York Times and the Guardian). Given their pervasive nature, ACM U.S. Public Policy Council acknowledges that it is imperative to address “challenges associated with the design and technical aspects of algorithms and preventing bias from the onset”.   On this basis, they propose 7 principles, compatible with their code of ethics.

As a provenance researcher, I have always regarded the need to log flows of information and activities, and ascribe responsibility for these as crucial steps to making systems accountable. This view was echoed by Danny Weitzner and team in their seminal paper on Information Accountability.  I was therefore delighted to see that “Data provenance” was listed as an explicit principle of the USACM list of seven principles. So, instead of paraphrasing them, I take the liberty of copying them below.

 

Figure 1: ACM US Public Policy Council list of seven principles for Algorithmic Transparency and Accountability

 

 

However, I feel that provenance, as understand it, encompasses several of these principles, something that I propose to investigate in the rest of this post.  To illustrate this, I propose Figure 2, a block diagram outlining the high-level architecture of a transparent and accountable system.  At the heart of such a system, we find its Business Logic which provides its primary functionality (e.g. Recommendations, Analytics, etc).  In provenance-aware systems, applications log their activities and data flows, out of which a semantic representation is constructed, which I refer to as provenance. PROV is a standardised representation for provenance, which was recently published by the World Wide Web consortium and seeing strong adoption in various walks of life.  In this context, provenance is defined as “a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing”.

There is no point constructing such a semantic representation, if it is not being exploited. Various capabilities can be built on top of such a provenance repository, including query interfaces, audit functionality, explanation service, redress mechanism and validation, which we discuss now in light of the seven principles.

 

The Role of Provenance in the Architecture of an Accountable System

Figure 2: The Role of Provenance in the Architecture of an Accountable System

 

The first principle (Awareness) identifies a variety of stakeholders: Owners, Designers and Builders, Users, but the second principle also mentions the role of Regulators, and we believe that potential third-party Auditors are also relevant in that context.  While technology makes progress with algorithmic solutions, society is much slower to react, and there is indeed work required to increase awareness, and establish what the user rights are, and what the obligations on owners should be, whether by means of regulations or self-regulations. The SmartSociety project recently published a Social Charter for Smart Platforms, which is an illustration of what rights and obligations can be in “smart” platforms. 

The second principle (Access and Redress) recommends mechanisms by which systems can be questioned and redress enabled for individuals.This principle points to the ability to query the system and its past actions, which is a typical provenance-based functionality. For those seeking redress, there is a need to be able to refer to an event that resulted in an unsatisfactory outcome; PROV-based provenance mandates that all outcome, data and activity instances are uniquely identified.  Furthermore, we are of the view that such a redress mechanism, including reached resolutions, should be inspectable in a similar fashion; thus, provenance of redress requests and resolutions should also be inspectable.

The third principle (Accountability) is concerned with holding institutions responsible for the decisions made by their algorithmic systems.   For this, one needs a non-repudiable account of what has happened, and suitable attribution of decisions to system components, their owners, and those legally responsible for the system’s actions. Again, such an account is exactly what PROV offers: therefore we see the third principle being implemented technically with queries over provenance representation, and socially with suitable regulatory and enforcement mechanisms.

The fourth principle (Explanation) requires explanations to be produced about the unfolding of activities and decisions.  There is emerging evidence that provenance can serve as a form of computer-based narrative, out of which textual explanations can be composed and presented to users.  We recently conducted some user studies about the perceived legibility of natural language explanations by casual users.  We also used a similar technique in order to provide explanations about user ratings in a Ride Share application.

The fifth principle (Data Provenance) is explicitly focusing on training data used to train so-called “machine-learning” algorithms. We believe that it is not just training data that is relevant, but any external data, the business logic and designers may rely upon. It is expected that public scrutiny of such data offers opportunity to correct potential bias, and in general, any concern that may affect decisions. To operationalize this principle, one needs to have access to a description of the data (potentially, the data itself), but also how it is used in training algorithms, and how this potentially affects decisions. PROV-based Provenance, queries and explanations are required here to allow such scrutiny. Some of our recent work focused on analytics techniques to assess the quality of data, using provenance information; such a mechanism becomes useful to ensure some form of quality control in systems.

The sixth principle (Auditability) demands models, algorithms, data, and decision to be recorded, so that they can be audited. All these can easily be described in PROV, by means of “PROV entities“, which can be used or generated by “PROV activities“, under the supervision of responsible agents. Specific auditing functions (aimed at various stakeholders) can query the provenance to expose individual entities, but also their aggregate characteristics, over longer periods of time. Techniques that we have developed, such as provenance summarisation, become really critical in this context, since they enable us to investigate aggregate behaviour of applications, instead of individual circumstances.

The seventh principle (Validation and Testing)  recommends regular validation of models and testing for harmful outcomes.  This suggests that processing over provenance, checking whether  some expected criteria has been met or not, can be implemented by policy-based approaches over provenance, detecting whether  past executions comply with expectations, described as policies. We have applied this technique to decide whether processing was performed in compliance with usage policies. If this is good practice to undertake validation and testing, therefore, it also becomes a necessity to document such a practice, to be able demonstrate that such validation and testing takes place.

So overall, the provenance research community has been investigating issues around capturing, storing, representing, querying and exploiting provenance information, all of them having a critical role in the principles of Algorithmic Transparency and Accountability.  There is still much to research however, including critical issues around (1) agreed domain-specific extensions of PROV to support transparency and accountability; (2) better integration of the software engineering methodologies with provenance; (3) enforceable compliance with architecture; (4) non repudiation of provenance; (5) querying and auditing facility; (6) compliance checks over provenance; (7) user-friendly explanation of complex algorithmic decisions; (8) scalability of all the above issues.

In the spirit of Principle 1,  I hope this blog post contributes to raising awareness of these issues. Feedback and comments welcome!

What is in ProvToolbox 0.7.3?

Today, I released ProvToolbox 0.7.3. The  principal changes in this new version of ProvToolbox are concerned with prov-template, the templating system for provenance. The new release also contains few minor bug fixes and changes.

1. Template System

A reminder: a PROV-template is a PROV document, in which some variables are placeholders for values. A PROV-template is a declarative specification of the provenance intended to be generated by an application.   A set of bindings contains associations between variables and values. The PROV-template  expansion algorithm, when provided with a template and a set of bindings, generates a provenance document, in which all variables have been replaced by values.

PROV-template is a new approach to creating a provenance-enabled application. Templates are designed and embedded in the application’s code, the application logs values (in the form of bindings), and provenance is automatically generated by template expansion.

A tutorial for templates is available on this blog:

In ProvToolbox 0.7.3, we have adopted a more compact and user-friendly representation for sets of bindings. Instead of representing them as PROV, we can now represent them as JSON.  At the same time, we also handle variables in a more uniform manner, allowing variables occurring in mandatory position, to be also used in attribution position. I won’t go into the technical details, but these two changes make the design of templates and the construction of bindings  much simpler!

A further change is that we have implemented a simple “bindings bean” compiler: it takes a template definition and creates a java class, which allows sets of bindings to be created directly from Java, and serialized easily.  The aim of this compiler is to simplify the implementation of applications generating provenance.

The GitHub source code repository contains code for two further tutorials (Tutorial5 and Tutorial6). I will write up the text for these tutorials in the New Year.

2. Qualified Pattern for All PROV Relations

At the recent PROV: Three Years Later Workshop, I made the case for the Qualified Pattern  to be used for all PROV relations. My key motivation for this extension to PROV is my provenance summarisation algorithm, which generates a “summary provenance graph“, in which nodes and edges are annotated with weights indicating how frequently these kinds of nodes and edges  can be found in the original graph. To allow for such annotations to be added to specialization, alternate, and membership relations, they need to support the Qualified Pattern.

At this stage, it is the data model that is modified. Serialization to xml and provn is work in progress, and not supported in prov-json and prov-sql yet. Furthermore, there is no parsing yet. Three new interfaces have been defined in the package org.openprovenance.prov.model.extension.

3. Release Log

For full details of the changes, see the release log at https://github.com/lucmoreau/ProvToolbox/wiki/Releases#073.

4. Conclusion

We keep on using ProvToolbox in various applications to generate provenance with templates and to undertake some analytics using the summarisation algorithm. This new release was critical to support these two use cases of ProvToolbox. Shortly, I will release two further blogs with new tutorials for prov-template.

As always, all relevant links can be found at http://lucmoreau.github.io/ProvToolbox/, including binary installers for linux (rpm and debian) and macosx.

Seasonal greetings!

 

 

PhD studentships available

Are you interested in undertaking PhD studies in the area of provenance? Are you intrigued by some of the following topics?

  • Provenance architecture for the Internet of Things
  • Big data analytics of provenance
  • Streamed provenance
  • Tradeoff between provenance and privacy
  • Unforgeable provenance and blockchain

Contact me and we can discuss a PhD topic. We have fully-funded studentships for DTA-eligible students (see  details http://www.ecs.soton.ac.uk/phd/studentships).

Building 32 hosts the Web and Internet Science Group

Building 32 hosts the Web and Internet Science Group

What is in ProvToolbox 0.7.2?

1. Introduction

Yesterday, I released ProvToolbox 0.7.2, which includes the following novel features.

2. Novel Features

2.1. MacOS X Installer

Continuing our efforts of providing binary installers to facilitate installation of ProvToolbox, this release includes an installer for MacOS X.

Simply follow the link http://openprovenance.org/java/installer/provconvert-0.7.2.dmg, you will then be given access to the installation image.

Installation Disk

Installation Disk

Click on the Installer. Note that you need to allow installation of programs from any sources in your security preferences. Then simply follow the instructions. The installer will install all libraries and executable in /Applications/provconvert (default location, which can be overriden), as well as a symbolic link making the provconvert executable available in your execution path. An Uninstaller is also available as an executable jar file /Applications/provconvert/Uninstaller/uninstaller.jar.

provconvert Installer

provconvert Installer

Et voila! The executable can be invoked directly from the command line.

provconvert -version

which should return provconvert version 0.7.2 (2015-09-15 20:16).

2.2. Templates

As we continue to use templates in our applications, two further requirements have been implemented. It is now possible to expand a template, and strip the result from any variable that has not been instantiated. For this, simply pass the option -allexpand to provconvert, to be used in conjunction with the -bindings option (see Tutorial 4 (part 1) and Tutorial 4 (part 2) on template processing in ProvToolbox). Furthermore, an error code is returned when not all variables have been expanded.

2.3. Interoperability

As we are integrating Provtoolbox, ProvStore and ProvStore in the inter-operability harness developed by the Software Sustainability Institute, we have fixed some minor issues to ensure interoperability between our software stacks.

2.4. provconvert artifact

The artifact toolbox has been renamed into provconvert, since we have plans for other artifacts out of ProvToolbox.

3. Conclusion

For all details about ProvToolbox, see the github.io page http://lucmoreau.github.io/ProvToolbox/.

What is in ProvToolbox 0.7.1?

1. Introduction

Yesterday, I released ProvToolbox 0.7.1. It is a minor release, fixing minor bugs of 0.7.0, and including a useful new feature.

2. Novel Features

2.1. Debian Package

To facilitate installation, a new binary release format is now supported: Debian packaging to support binary release on Ubuntu and other Debian-based Linux distributions. You just need to run the following commands.

wget https://repo1.maven.org/maven2/org/openprovenance/prov/toolbox/0.7.1/toolbox-0.7.1.deb
dpkg --install toolbox-0.7.1.deb

This is in addition to RPM support introduced in 0.6.2:

rpm -U https://repo1.maven.org/maven2/org/openprovenance/prov/toolbox/0.7.1/toolbox-0.7.1-rpm.rpm

2.3 Visualization

Modification of the visualisation component prov-dot allow dge thickness, node size, and tooltips (on SVG) to be controlled. For this, the provenance graph nodes and edges need to be annotated with reserved attributes dot:size and dot:tooltip. The following figure illustrates the kind of graphs that can now be generated.

A summarisation of the provenance challenge workflow. Nodes are to be understood as provenance types. Thickness of edges and size of nodes reflect their frequency in the summarised document.

A summarisation of the provenance challenge workflow. Nodes are to be understood as provenance types. Thickness of edges and size of nodes reflect their frequency in the summarised document.

2.3 Bug fixes

I also fixed some minor bugs in qualified namespaces in the prov-sql package, and updated reserved namespace for provtoolbox.

3. Conclusion

Tell me how you use ProvToolbox and/or provconvert and for for which purpose. Share details of your projects with me, I will add them to https://github.com/lucmoreau/ProvToolbox/wiki/Projects-and-Applications-Using-ProvToolbox.

For all details about ProvToolbox, see the github.io page http://lucmoreau.github.io/ProvToolbox/.

ProvToolbox Tutorial 4: Templates for Provenance (part 2)

1. Introduction

This blog post is the second part of the introduction to provenance templates.

The tutorial is standalone and a zip archive can be downloaded from the following URL: http://search.maven.org/remotecontent?filepath=org/openprovenance/prov/ProvToolbox-Tutorial4/0.7.0/ProvToolbox-Tutorial4-0.7.0-src.zip. The tutorial can also be found on the ProvToolbox project on GitHub.

The tutorial assumes that provconvert has been installed and is available in the execution path. (See http://lucmoreau.github.io/ProvToolbox/ for installation instructions.) The tutorial relies on a Makefile and can simply be run by calling:

make do.all

2. Further Examples of Templates

We continue out introduction to PROV-Template by some examples.

2.1 Another Template: Quotes, Authors and Their Organisations

Our initial template could deal with attribution of quotes to authors. It is useful to be able to talk about author’s organizations. PROV offers the notion of delegation, by which we can relate an author agent to the organization agent, on behalf of which they acted. This is represented with the following template, where the link marked “del” represent the delegation association.

Template for Quote Attribution and Representation of  an Organization

Template for Quote Attribution and Representation of an Organization

The only differences in the template occur in line 11, where a new variable var:institution is introduced for the organization, and in line 12, where a delegation association is expressed with the term actedOnBehalfOf.

document
  prefix var <http://openprovenance.org/var#>
  prefix vargen <http://openprovenance.org/vargen#>
  prefix tmpl <http://openprovenance.org/tmpl#>
  prefix foaf <http://xmlns.com/foaf/0.1/>
  
  bundle vargen:bundleId
    entity(var:quote, [prov:value='var:value'])
    entity(var:author, [prov:type='prov:Person', foaf:name='var:name'])
    wasAttributedTo(var:quote,var:author)
    entity(var:institution)
    actedOnBehalfOf(var:author,var:institution,-)
  endBundle

endDocument

2.2 Template Instantiation: Cartesian Products

We are now ready to instantiate the template. To the previous bindings, we want to add two possible values for the variable var:institution. For the purpose of this example, we also consider two quotes ex:quote1 and ex:quote2 authored by Paul and Luc.

var:quote ex:quote1
ex:quote2
var:author http://orcid.org/0000-0002-3494-120X
http://orcid.org/0000-0003-0183-6910
var:name “Luc Moreau”
“Paul Groth”
var:institution http://www.soton.ac.uk/
http://labs.elsevier.com/

The resulting expansion is displayed in the following figure. We see that for the expression actedOnBehalfOf(var:author,var:institution,-), the template expansion algorithm considers all possible values of var:author and all possible values of var:institution, and created an instantiated association for each possible pair author/institution. In other words, by default, the template expansion considers the cartesian product of sets of values for the different variables in a relation.

Template Instantiation with Organisations (Cartesian Product)

Template Instantiation with Organisations (Cartesian Product)

2.3 Template With Variable Synchronization

The expansion is not quite right. While we are fine with each quote being attributed to both Paul and Luc, the former is affiliated with Elsevier, whereas the later is affiliated with Southampton. Thus, we wish to constrain the instantiation of actedOnBehalfOf(var:author,var:institution,-), so that don’t form the cartesian product of all possibilities. Instead, we want the values of var:author and var:institution to be enumerated in lockstep; in other words, a change of value for one should be synchronized with a change of value for the other.

This requires a simple modification to the template. We see here that an attribute is added to the var:institution entity.

Template With Variable Synchronization

Template With Variable Synchronization

It is made explicit in line 12, where the attribute [tmpl:linked='var:author'] explicitly synchronizes the value of var:institution with that of var:author. Note that this synchronization is for the whole template, not just for the relation that occurs in line 13.

document

  prefix var <http://openprovenance.org/var#>
  prefix vargen <http://openprovenance.org/vargen#>
  prefix tmpl <http://openprovenance.org/tmpl#>
  prefix foaf <http://xmlns.com/foaf/0.1/>
  
  bundle vargen:bundleId
    entity(var:quote, [prov:value='var:value'])
    entity(var:author, [prov:type='prov:Person', foaf:name='var:name'])
    wasAttributedTo(var:quote,var:author)
    entity(var:institution, [tmpl:linked='var:author'])
    actedOnBehalfOf(var:author,var:institution,-)
  endBundle

endDocument

2.3 Template Instantiation: Single Institution per Author

We are now again ready to instantiate the template. The resulting expanded document occurs in the figure below. We see that affiliation is now properly expressed.

Template Instantiation: Single institution per Author

Template Instantiation: Single institution per Author

3. Conclusions

As we said in Part 1, PROV-Template is easy to work with, it just requires provconvert to be installed. This notion of template supports the idea of a provenance template management system, in which all the provenance is generated by means of templates, which possibly can also evolve over time. By decoupling the generation of provenance from the logging of values, we observed a number of benefits:

  • It allows us to fine tune the provenance, independently of the application.
  • It permits us to keep the code to generate the provenance separate from the application itself: this is very convenient when the application programmer is not familiar with provenance, or there is no toolkit available to generate provenance from the application.
  • It allows us to adopt a more conceptual approach to provenance, thinking of “provenance schemas” rather than instances.

For feedback or comments on this tutorial, please raise an issue on the ProvToolbox issue tracket at https://github.com/lucmoreau/ProvToolbox/issues/.

Thanks to co-authors Dong and Danius. Heather has been using it in Smart Society’s SmartShare application.

ProvToolbox Tutorial 4: Templates for Provenance (part 1)

1. Introduction

In several of our applications, we felt the need of separating the logging of information from the constructing and storing of provenance. For this, we introduced PROV-Template a templating system for provenance, describing the shape of provenance graphs to be generated, and we specified an algorithm capable of instantiating templates, with specific values.

The purpose of this tutorial is to introduce PROV-Template and how templates can be instantiated using ProvToolbox. This functionality is directly available from the command line using provconvert.

The tutorial is standalone and a zip archive can be downloaded from the following URL: http://search.maven.org/remotecontent?filepath=org/openprovenance/prov/ProvToolbox-Tutorial4/0.7.0/ProvToolbox-Tutorial4-0.7.0-src.zip. The tutorial can also be found on the ProvToolbox project on GitHub.

The tutorial assumes that provconvert has been installed and is available in the execution path. (See http://lucmoreau.github.io/ProvToolbox/ for installation instructions.) The tutorial relies on a Makefile and can simply be run by calling:

make do.all

2. Example of Templates

2.1 A Template for Attribution of a Quote

Building on blog post “A little provenance goes a long way”, imagine that we need to systematically provide attribution to quotes. As this is a repetitive tasks, we should consider the PROV-Templates approach to generate provenance.

A provenance template is itself a PROV document in which some variables act as placeholders for values to be filled at expansion time. More precisely, a template is a bundle of PROV assertions: a bundle is the PROV mechanism by which provenance of provenance can be expressed.

The figure below contains a graphical illustration of a template for Quote Attribution. It contains the following variables:

  • var:author the identifier of the author (stated to be a prov:Person)
  • var:name the author’s name
  • var:quote the identifier of the quote
  • var:value the quote itself
  • vargen:bundleId the identifier of the bundle to be generated

The quote is attributed to the author agent. The variables var:author, var:namer, var:quote, var:value are qualified names in a namespace reserved for PROV-Template variables, and are conventionally prefixed with the prefix var. There is an expectation that values need to be provided for these variables when instantiating a template. On the other hand, the variable vargen:bundleId, with prefix vargen, can have a value generated automatically at instantiation time.

Quote Attribution Template

Quote Attribution Template

Concretely, in the PROV-N notation, the template is expressed as follows.

document

  prefix var <http://openprovenance.org/var#>
  prefix vargen <http://openprovenance.org/vargen#>
  prefix tmpl <http://openprovenance.org/tmpl#>
  prefix foaf <http://xmlns.com/foaf/0.1/>
  
  bundle vargen:bundleId
    entity(var:quote, [prov:value='var:value'])
    entity(var:author, [prov:type='prov:Person', foaf:name='var:name'])
    wasAttributedTo(var:quote,var:author)
  endBundle

endDocument

2.2 Template Instantiation: A Little Provenance Goes a Long Way

Let’s now look into how we can instantiate the templates. Let us consider the following bindings for the 4 variables author, name, quote and value. An association between a variable and a value is referred to as a binding.

var:author http://orcid.org/0000-0002-3494-120X
var:name “Luc Moreau”
var:quote ex:quote1
var:value “A Little Provenance Goes a Long Way”

If we instantiate the template with these bindings, we obtain the following instantiated document. We note that vargen:bundleId was instantiated with UUID value.

Template Instantiation for "A Little Provenance Goes a Long Way"

Template Instantiation for “A Little Provenance Goes a Long Way”

Expansion of a template with provconvert is straightforward. The parameter -infile must be used to provide the template. The binding file is specified with the -binding parameter. The resulting instantiated template is specified with -outfile.

	
provconvert -infile template1.provn -bindings binding1.ttl -outfile doc1.provn

The input template and its instantiation can be expressed in any of the formats supported by ProvToolbox. We still have to express the set of bindings. We did not want to introduce a new specific format (though we may do it in the future), so, we just decided to use PROV. In particular, the Turtle notation is fairly elegant in this case. Two family of properties are introduced in the tmpl namespace, namely value_i and 2dvalue_i_j, for binding variables in identifier and value positions, respectively.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau".
var:quote  a prov:Entity;
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity;
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

Details about the syntax of bindings can be found in https://provenance.ecs.soton.ac.uk/prov-template/.

2.3 Template Instantiation: A Second Author

In some cases, we would like to express that there is a second author to a document. The attribution template does not need to be redefined. We simply need to provide relevant bindings for the second author.

For instance, Paul and Luc are the two authors of that quote. Conceptually, we want to provide the following bindings.

var:author http://orcid.org/0000-0002-3494-120X
http://orcid.org/0000-0003-0183-6910
var:name “Luc Moreau”
“Paul Groth”

We see that each of var:author and var:name is given two values. This results in the following expanded provenance graph.

Instantiation with Two Authors

Template Instantiation with Two Authors

The contents of the bindings file is explicit below. Lines 7-9, var:author is given two values, using the properties tmpl:value_0 and tmpl:value_1. Lines 10-12, var:name is given two values to occur in attribute position, with properties tmpl:2dvalue_0_0 and tmpl:2dvalue_1_0.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>;
           tmpl:value_1 <http://orcid.org/0000-0003-0183-6910>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau";
           tmpl:2dvalue_1_0 "Paul Groth".
var:quote  a prov:Entity; 
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity; 
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

Again, we refer the reader to the PROV-Template specification for details of the bindings syntax.

2.4 Template Instantiation: More Attributes

In general, PROV also allows for variable number of attribute values to be provided for a given attribute. For instance, we may want the name and nick name to be provided as two possible values for the var:name variable. This would result in the following expanded graph.

Template Instantiation: Variable Number of Attributes

Template Instantiation with Variable Number of Attributes

Again, the template remains unchanged, but the bindings are as follows. In lines 12-13, we see two possible names for Paul, respectively expressed with tmpl:2dvalue_1_0 and tmpl:2dvalue_1_1. This shows that template expansion can support a variable number of attributes for different statements instantiated from the same template statement.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>;
           tmpl:value_1 <http://orcid.org/0000-0003-0183-6910>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau";
           tmpl:2dvalue_1_0 "Paul Groth";
           tmpl:2dvalue_1_1 "pgroth".
var:quote  a prov:Entity;
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity;
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

3. Conclusions

PROV-Template is easy to work with, it just requires provconvert to be installed. By decoupling the generation of provenance from the logging of values, we observed a number of benefits:

  • It allowed us to fine tune the provenance, independently of the application.
  • It permitted us to keep the code to generate the provenance separate from the application itself.
  • It allowed us to adopt a more conceptual approach to provenance, thinking of “provenance schemas” rather than instances.

This is the first part of the tutorial on PROV-Template. In the second part of the tutorial, we will see how PROV-Template can support more sophisticated use cases.

Thanks to co-authors Dong and Danius. Heather has been using it in Smart Society’s SmartShare application.