Service for template-based provenance generation

Posted by lucmoreau

Previously I blogged about prov-template, an approach for generating provenance. It consists of a declarative definition of the shape of provenance to be generated, referred to as template, and sets of values to instantiate templates into concrete provenance. Today, I am pleased to write about a new online service allowing such templates to be expanded into provenance.

The service is available from https://openprovenance.org/services/view/expander. I will now illustrate its use through a few examples.

First, let’s consider an example of template (see https://openprovenance.org/templates/org/openprovenance/generic/binaryop/1.provn for its provn version). Visually it look like this.

A template for a binary operation

It shows an activity, two entities used as input and an entity generated as output. There is an agent associated with the activity. The output is derived from the two inputs. This provenance description is contained in a provenance document, but what makes it a template is that the identities of nodes are variables (URIs in a reserved namespace with prefix var: var:produced, var:operation, …). These variables are meant to be replaced by concrete values. Variables are also allowed in value position of property-value pairs (cf. var:operation_type).

This template for instance could be used to describe the addition of two numbers resulting in their sum.

The template service looks as follows. Two input boxes respectively expect the URL to a template (you need to ensure that the template is accessible to the service) and the bindings between variables and values to instantiate the template. For convenience, the template pull down menu already provides the link to the template described above. Likewise, the example pull down menu contains several examples of bindings. Let’s select the first one, and click on the SVG button to generate an SVG representation of the expanded provenance.

Template expansion service

The result is as follows, variable names have been instantiated with identities of activity, entities and agent, but also with values of properties. Property-value pairs whose value was variable not assigned by the bindings are simply removed from the expanded provenance (for instance, as variable var:operation_type is unassigned, the property type was removed from the expansion).

provenance-b1

Below, we find the expanded provenance for the fourth binding. There, we see that two different outputs were provided output1 and output2, and they have been given different numbers of attributes.

provenance-b4

The language to express the bindings is a simple json structure. The first set of bindings is expressed as follows.

{
  "var": {
    "operation": [
      {
        "@id": "ex:operation1"
      }
    ],
    "agent": [
      {
        "@id": "ex:ag1"
      }
    ],
    "consumed1": [
      {
        "@id": "ex:input_1"
      }
    ],
    "consumed_value1": [
      {
        "@value": "4",
        "@type": "xsd:int"
      }
    ],
    "consumed2": [
      {
        "@id": "ex:input_2"
      }
    ],
    "consumed_value2": [
      {
        "@value": "5",
        "@type": "xsd:int"
      }
    ],
    "produced": [
      {
        "@id": "ex:output"
      }
    ],
    "produced_type": [
      {
        "@id": "ex:Result"
      }
    ],
    "produced_value": [
      {
        "@value": "20",
        "@type": "xsd:int"
      }
    ]
  },
  "context": {
    "ex": "http://example.org/#"
  },
  "template": "https://openprovenance.org/templates/org/openprovenance/generic/binaryop/1.provn"
}

Go ahead and experiment with templates and bindings using the service. For more details, please see previous posts.

Happy prov-templating …

openprovenance.org Relaunched

Posted by lucmoreau

It is my pleasure to announce the relaunch of openprovenance.org, the site for standard-based provenance solutions.

With our move to King’s College London, Dong and I have migrated the provenance services from Southampton to King’s. I am pleased to announce the launch of the following services at openprovenance.org:

ProvStore, the provenance repository that enables users to store, share, browse and manage provenance documents. ProvStore is available from https://openprovenance.org/store/.
A translator capable of converting between different representations of PROV, including visual representations of PROV in SVG. The translator service can be found at https://openprovenance.org/services/view/translator.
A validator service that checks provenance documents against the constraints defined in prov-constraints. Such a service can detect logically inconsistent provenance. An example of such inconsistency is when an activity is said to have started after it ended, or when something is being used before it was even created. The validator is hosted at https://openprovenance.org/services/view/validator.
A template expansion service facilitates a declarative approach to provenance generation, in which the shape of provenance can be defined by a provenance document containing variables, acting as placeholders for values. When provided with a set of bindings associating variables to values, the template expansion service generates a provenance document. The template expansion service lives at https://openprovenance.org/services/view/expander.

The Southampton services will be decommissioned shortly. If you have data in the old provenance store, we provide a procedure for you to download your provenance documents from the old store, and to upload them at openprovenance.org. In the age of GDPR, you will have to sign up for the new provenance store and accept its terms and conditions.

While the look and feel of the services may look quite similar, under the bonnet, there have been significant changes.

We have adopted a micro-service architecture for our services, allowing them to be composed in interesting ways. Services are deployed in Docker containers, facilitating their redeployment and enabling their configurability. We are also investigating other forms of licensing that would allow the services to be deployed elsewhere, allowing the host to have full control over access, storage and management. (Contact us if this is of interest to you.)
We have adopted Keycloak for identity management and access control for our existing and future micro-services. This offers an off-the-shelf solution for managing identities and obtaining consent. A single registration for all our services will now be possible.

As before, the above services are powered by some open source libraries, which can also be found from openprovenance.org. ProvToolbox is a Java toolkit for processing provenance; it is available from http://lucmoreau.github.io/ProvToolbox/. Likewise, Prov Python is a toolkit for processing provenance in Python and can be found at https://pypi.org/project/prov/.

PROV-Template: A Quick Start

Posted by lucmoreau

The aim of this blog post is to provide simple guidelines to generate provenance using the PROV-Template approach.

A quick reminder: a provenance template is a PROV document, describing the provenance that it is intended to be generated. A provenance template includes some variables that are placeholders for values. So, a provenance template can be seen as a declarative specification of the provenance intended to be generated by an application. A set of bindings contains associations between variables and values. The PROV-template expansion algorithm, when provided with a template and a set of bindings, generates a provenance document, in which variables have been replaced by values.

Therefore, three steps are involved in this methodology.

Design a “provenance template” describing the structure of the provenance intended to be generated.
Instrument the application, log values, and create “binding files” from these values.
Produce provenance by expanding the template using binding files.

We consider a simple computation, which we would like to describe with provenance. The computation consisted of 3 calls of binary functions: the functions were composed in such a way that the results of two calls were used by the third one. To simplify, we assume that the operations were arithmetic +, -, and *, and the values flowing and out of these operations were integers. Note my use of past tense: the aim of provenance is to describe past computation, as opposed to a future, hypothetical computation (or workflow).

(10+11)-(7*5)

As we have 3 binary functions, we design a template describing the invocation of a binary function. It consists of an activity (denoted by variable operation), two used entities (denoted by variables consumed1 and consumed2), a generated entity (denoted by variable produced), and an agent (denoted by variable agent) responsible for the activity. Graphically, the template can be represented as follows.

Template for the invocation of a binary function

Using the PROV-N notation, the template is expressed as follows. We see that variables are declared in the namespace with prefix var. Each entity and activity is associated with a type, expressed by a variable, which can also be instantiated.

document
 prefix tmpl <http://openprovenance.org/tmpl#>
 prefix var <http://openprovenance.org/var#>
 prefix vargen <http://openprovenance.org/vargen#>

 bundle vargen:b
  activity(var:operation, [ prov:type='var:operation_type' ] )
  agent(var:agent)
  wasAssociatedWith(var:operation,var:agent,-)
  entity(var:consumed1,[prov:value='var:consumed_value1'])
  entity(var:consumed2,[prov:value='var:consumed_value2'])
  used(var:operation, var:consumed1, - )
  used(var:operation, var:consumed2, - )  
  entity(var:produced,[prov:type='var:produced_type',prov:value='var:produced_value'])
  wasGeneratedBy(var:produced, var:operation, - )
  wasDerivedFrom(var:produced, var:consumed1)
  wasDerivedFrom(var:produced, var:consumed2)
 endBundle
endDocument

To be able to generate provenance, one needs to define so-called “bindings files”, associating variables with values. The structure of bindings file is fairly straightforward: with the most recent version of the ProvToolbox, a bindings file can be expressed as a simple JSON structure. Such JSON structures are very easy to generate programmatically from multiple programming languages. However, in this blog post, we do not want to actually program anything in order to generate provenance.

Therefore, we are going to assume that the application already logs values of interest. We are further going to assume that the data can be easily converted to a tabular format, and specifically, that a CSV (comma separated values) representation can be constructed from those logs. The structure that we expect is illustrated in the following figure. In the first line of the file, we find variable names (exactly those found in the template) acting as column headers. In the second line, we find the type of the values found in the table.

Application log as a CSV file. First line contains variable names whereas second line contains the type of their values. Subsequent lines are the actual values.

Concretely, the CSV file uses commas as separator. The third, fourth, and five lines contain deftail of the invocations of the plus, times, and subtraction functions.

operation, operation_type, consumed1, consumed_value1, consumed2, consumed_value2, produced, produced_value, agent
prov:QUALIFIED_NAME, prov:QUALIFIED_NAME, prov:QUALIFIED_NAME, xsd:int, prov:QUALIFIED_NAME, xsd:int, prov:QUALIFIED_NAME, xsd:int, prov:QUALIFIED_NAME
ex:op1, ex:plus, ex:e1, 10, ex:e2, 11, ex:e3, 21, ex:Luc
ex:op2, ex:times, ex:e4,  5, ex:e5,  7, ex:e6, 35, ex:Luc
ex:op3, ex:subtraction, ex:e3, 21, ex:e6, 35, ex:e7, -14, ex:Luc

Each line can automatically be converted to a JSON file. For instance, the third line containing the details of the addition operation can be converted to the following JSON structure, which is essentially a dictionary associating each variable with its corresponding value, with an explicit representation of the typing information where appropriate.

{
 "var":
   {"operation": [{"@id": "ex:op1"}],
    "operation_type": [{"@id": "ex:plus"}],
    "consumed1": [{"@id": "ex:e1"}],
    "consumed_value1": [ {"@value": "10", "@type": "xsd:int"}],
    "consumed2": [{"@id": "ex:e2"}],
    "consumed_value2": [{"@value": "11", "@type": "xsd:int"}],
    "produced": [{"@id": "ex:e3"}],
    "produced_value": [ {"@value": "21", "@type": "xsd:int"} ],
    "agent": [{"@id": "ex:Luc"}]},
 "context": {"ex": "http://example.org/"}
}

We do not need to create this JSON structure ourselves. Instead, we provide an awk script that converts a given line into a bindings file.


function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s)  { return rtrim(ltrim(s)); }

BEGIN {
      printf("{\"var\":\n{")
      OFS=FS=","
}
NR==1 {                                # Process header
    for (i=1;i<=NF;i++)                
        head[i] = trim($i)                  
    next                               
}
NR==2 {                                # Process types
    for (i=1;i<=NF;i++)                
        type[i] = trim($i)             
    next                               
}
NR==line{
    first=1
    for (i=1;i<=NF;i++) {              # For each field
	if (first) {
	    first=0
	} else {
	    printf ","
	}
	if (type[i]=="prov:QUALIFIED_NAME") {
	    printf "\"%s\": [{\"@id\": \"%s\"}]",  trim(head[i]), trim($i)
	} else if (type[i]=="xsd:string") {
	    printf "\"%s\": [ \"%s\" ]",  trim(head[i]), trim($i)
	} else  {
	    printf "\"%s\": [ {\"@value\": \"%s\", \"@type\": \"%s\"} ]",trim(head[i]), trim($i), trim(type[i])
	}
    }
    printf "\n"                        
}
END {
    printf("},\n")
    printf("\"context\": {\"ex\": \"http://example.org/\"}\n")
    printf("}\n")    
}

To facilitate the processing, we even provide a Makefile with a target do.csv that processes a line (variable LINE) of the csv file to generate a bindings file. It is then used by the utility provconvert to expand the template file. The target workflow hard-codes the presence of three lines in the CSV, the generation of a bindings file for each line, and the expansion of the template with these bindings. All files are then merged in a single provenance file using the -merge option of provconvert.

LINE=4

do.csv:
	cat bindings.csv | awk -v line=$(LINE) -f src/main/resources/awk/tobindings.awk  > target/bindings$(LINE).json
	provconvert -bindver 3 -infile template_block.provn -bindings target/bindings$(LINE).json -outfile target/block$(LINE).provn


workflow:
	$(MAKE) LINE=3 do.csv
	$(MAKE) LINE=4 do.csv
	$(MAKE) LINE=5 do.csv
	printf "file, target/block3.provn, provn\nfile, target/block4.provn, provn\nfile, target/block5.provn, provn\n" | provconvert -merge - -flatten -outfile target/wfl.svg

The resulting provenance is displayed in the following figure.

Expanded provenance showing three activities, consumed and generated entities, and an agent.

Concluding Remarks

Given a log file in CSV format, we have shown it is becoming easy to generate PROV-compliant provenance without having to write a single line of code: an awk script converts CSV data to JSON, used to expand a template expressed in a PROV-compliant format.

For the provenance to be meaningful, the application must be instrumented to log the relevant values. For instance, each entity/agent/activity is expected to have been given a unique identifier.

The template design phase is also critical. In our design, we decided that one template would describe the invocation of a single function. The same template was reused for all function calls. Alternatives are possible: multiple activities could be described in a single template, alternatively different types of activities could be described in different templates. I will come back to this issue in another blog post in a few weeks.

What is in ProvToolbox 0.7.3?

Posted by lucmoreau

Today, I released ProvToolbox 0.7.3. The principal changes in this new version of ProvToolbox are concerned with prov-template, the templating system for provenance. The new release also contains few minor bug fixes and changes.

1. Template System

A reminder: a PROV-template is a PROV document, in which some variables are placeholders for values. A PROV-template is a declarative specification of the provenance intended to be generated by an application. A set of bindings contains associations between variables and values. The PROV-template expansion algorithm, when provided with a template and a set of bindings, generates a provenance document, in which all variables have been replaced by values.

PROV-template is a new approach to creating a provenance-enabled application. Templates are designed and embedded in the application’s code, the application logs values (in the form of bindings), and provenance is automatically generated by template expansion.

A tutorial for templates is available on this blog:

ProvToolbox Tutorial 4: Templates for Provenance (part 1).
https://lucmoreau.wordpress.com/2015/07/30/provtoolbox-tutorial-4-templates-for-provenance-part-1/
ProvToolbox Tutorial 4: Templates for Provenance (part 2).
https://lucmoreau.wordpress.com/2015/08/03/provtoolbox-tutorial-4-templates-for-provenance-part-2/

In ProvToolbox 0.7.3, we have adopted a more compact and user-friendly representation for sets of bindings. Instead of representing them as PROV, we can now represent them as JSON. At the same time, we also handle variables in a more uniform manner, allowing variables occurring in mandatory position, to be also used in attribution position. I won’t go into the technical details, but these two changes make the design of templates and the construction of bindings much simpler!

A further change is that we have implemented a simple “bindings bean” compiler: it takes a template definition and creates a java class, which allows sets of bindings to be created directly from Java, and serialized easily. The aim of this compiler is to simplify the implementation of applications generating provenance.

The GitHub source code repository contains code for two further tutorials (Tutorial5 and Tutorial6). I will write up the text for these tutorials in the New Year.

2. Qualified Pattern for All PROV Relations

At the recent PROV: Three Years Later Workshop, I made the case for the Qualified Pattern to be used for all PROV relations. My key motivation for this extension to PROV is my provenance summarisation algorithm, which generates a “summary provenance graph“, in which nodes and edges are annotated with weights indicating how frequently these kinds of nodes and edges can be found in the original graph. To allow for such annotations to be added to specialization, alternate, and membership relations, they need to support the Qualified Pattern.

At this stage, it is the data model that is modified. Serialization to xml and provn is work in progress, and not supported in prov-json and prov-sql yet. Furthermore, there is no parsing yet. Three new interfaces have been defined in the package org.openprovenance.prov.model.extension.

3. Release Log

For full details of the changes, see the release log at https://github.com/lucmoreau/ProvToolbox/wiki/Releases#073.

4. Conclusion

We keep on using ProvToolbox in various applications to generate provenance with templates and to undertake some analytics using the summarisation algorithm. This new release was critical to support these two use cases of ProvToolbox. Shortly, I will release two further blogs with new tutorials for prov-template.

As always, all relevant links can be found at http://lucmoreau.github.io/ProvToolbox/, including binary installers for linux (rpm and debian) and macosx.

Seasonal greetings!

What is in ProvToolbox 0.7.2?

Posted by lucmoreau

1. Introduction

Yesterday, I released ProvToolbox 0.7.2, which includes the following novel features.

2. Novel Features

2.1. MacOS X Installer

Continuing our efforts of providing binary installers to facilitate installation of ProvToolbox, this release includes an installer for MacOS X.

Simply follow the link http://openprovenance.org/java/installer/provconvert-0.7.2.dmg, you will then be given access to the installation image.

Installation Disk

Click on the Installer. Note that you need to allow installation of programs from any sources in your security preferences. Then simply follow the instructions. The installer will install all libraries and executable in /Applications/provconvert (default location, which can be overriden), as well as a symbolic link making the provconvert executable available in your execution path. An Uninstaller is also available as an executable jar file /Applications/provconvert/Uninstaller/uninstaller.jar.

provconvert Installer

Et voila! The executable can be invoked directly from the command line.

provconvert -version

which should return provconvert version 0.7.2 (2015-09-15 20:16).

2.2. Templates

As we continue to use templates in our applications, two further requirements have been implemented. It is now possible to expand a template, and strip the result from any variable that has not been instantiated. For this, simply pass the option -allexpand to provconvert, to be used in conjunction with the -bindings option (see Tutorial 4 (part 1) and Tutorial 4 (part 2) on template processing in ProvToolbox). Furthermore, an error code is returned when not all variables have been expanded.

2.3. Interoperability

As we are integrating Provtoolbox, ProvStore and ProvStore in the inter-operability harness developed by the Software Sustainability Institute, we have fixed some minor issues to ensure interoperability between our software stacks.

2.4. provconvert artifact

The artifact toolbox has been renamed into provconvert, since we have plans for other artifacts out of ProvToolbox.

3. Conclusion

For all details about ProvToolbox, see the github.io page http://lucmoreau.github.io/ProvToolbox/.

What is in ProvToolbox 0.7.1?

Posted by lucmoreau

1. Introduction

Yesterday, I released ProvToolbox 0.7.1. It is a minor release, fixing minor bugs of 0.7.0, and including a useful new feature.

2. Novel Features

2.1. Debian Package

To facilitate installation, a new binary release format is now supported: Debian packaging to support binary release on Ubuntu and other Debian-based Linux distributions. You just need to run the following commands.

wget https://repo1.maven.org/maven2/org/openprovenance/prov/toolbox/0.7.1/toolbox-0.7.1.deb
dpkg --install toolbox-0.7.1.deb

This is in addition to RPM support introduced in 0.6.2:

rpm -U https://repo1.maven.org/maven2/org/openprovenance/prov/toolbox/0.7.1/toolbox-0.7.1-rpm.rpm

2.3 Visualization

Modification of the visualisation component prov-dot allow dge thickness, node size, and tooltips (on SVG) to be controlled. For this, the provenance graph nodes and edges need to be annotated with reserved attributes dot:size and dot:tooltip. The following figure illustrates the kind of graphs that can now be generated.

A summarisation of the provenance challenge workflow. Nodes are to be understood as provenance types. Thickness of edges and size of nodes reflect their frequency in the summarised document.

2.3 Bug fixes

I also fixed some minor bugs in qualified namespaces in the prov-sql package, and updated reserved namespace for provtoolbox.

3. Conclusion

Tell me how you use ProvToolbox and/or provconvert and for for which purpose. Share details of your projects with me, I will add them to https://github.com/lucmoreau/ProvToolbox/wiki/Projects-and-Applications-Using-ProvToolbox.

For all details about ProvToolbox, see the github.io page http://lucmoreau.github.io/ProvToolbox/.

ProvToolbox Tutorial 4: Templates for Provenance (part 1)

Posted by lucmoreau

1. Introduction

In several of our applications, we felt the need of separating the logging of information from the constructing and storing of provenance. For this, we introduced PROV-Template a templating system for provenance, describing the shape of provenance graphs to be generated, and we specified an algorithm capable of instantiating templates, with specific values.

The purpose of this tutorial is to introduce PROV-Template and how templates can be instantiated using ProvToolbox. This functionality is directly available from the command line using provconvert.

The tutorial is standalone and a zip archive can be downloaded from the following URL: http://search.maven.org/remotecontent?filepath=org/openprovenance/prov/ProvToolbox-Tutorial4/0.7.0/ProvToolbox-Tutorial4-0.7.0-src.zip. The tutorial can also be found on the ProvToolbox project on GitHub.

The tutorial assumes that provconvert has been installed and is available in the execution path. (See http://lucmoreau.github.io/ProvToolbox/ for installation instructions.) The tutorial relies on a Makefile and can simply be run by calling:

make do.all

2. Example of Templates

2.1 A Template for Attribution of a Quote

Building on blog post “A little provenance goes a long way”, imagine that we need to systematically provide attribution to quotes. As this is a repetitive tasks, we should consider the PROV-Templates approach to generate provenance.

A provenance template is itself a PROV document in which some variables act as placeholders for values to be filled at expansion time. More precisely, a template is a bundle of PROV assertions: a bundle is the PROV mechanism by which provenance of provenance can be expressed.

The figure below contains a graphical illustration of a template for Quote Attribution. It contains the following variables:

var:author the identifier of the author (stated to be a prov:Person)
var:name the author’s name
var:quote the identifier of the quote
var:value the quote itself
vargen:bundleId the identifier of the bundle to be generated

The quote is attributed to the author agent. The variables var:author, var:namer, var:quote, var:value are qualified names in a namespace reserved for PROV-Template variables, and are conventionally prefixed with the prefix var. There is an expectation that values need to be provided for these variables when instantiating a template. On the other hand, the variable vargen:bundleId, with prefix vargen, can have a value generated automatically at instantiation time.

Quote Attribution Template

Concretely, in the PROV-N notation, the template is expressed as follows.

document

  prefix var <http://openprovenance.org/var#>
  prefix vargen <http://openprovenance.org/vargen#>
  prefix tmpl <http://openprovenance.org/tmpl#>
  prefix foaf <http://xmlns.com/foaf/0.1/>
  
  bundle vargen:bundleId
    entity(var:quote, [prov:value='var:value'])
    entity(var:author, [prov:type='prov:Person', foaf:name='var:name'])
    wasAttributedTo(var:quote,var:author)
  endBundle

endDocument

2.2 Template Instantiation: A Little Provenance Goes a Long Way

Let’s now look into how we can instantiate the templates. Let us consider the following bindings for the 4 variables author, name, quote and value. An association between a variable and a value is referred to as a binding.

var:author	http://orcid.org/0000-0002-3494-120X
var:name	“Luc Moreau”
var:quote	ex:quote1
var:value	“A Little Provenance Goes a Long Way”

If we instantiate the template with these bindings, we obtain the following instantiated document. We note that vargen:bundleId was instantiated with UUID value.

Template Instantiation for “A Little Provenance Goes a Long Way”

Expansion of a template with provconvert is straightforward. The parameter -infile must be used to provide the template. The binding file is specified with the -binding parameter. The resulting instantiated template is specified with -outfile.

	
provconvert -infile template1.provn -bindings binding1.ttl -outfile doc1.provn

The input template and its instantiation can be expressed in any of the formats supported by ProvToolbox. We still have to express the set of bindings. We did not want to introduce a new specific format (though we may do it in the future), so, we just decided to use PROV. In particular, the Turtle notation is fairly elegant in this case. Two family of properties are introduced in the tmpl namespace, namely value_i and 2dvalue_i_j, for binding variables in identifier and value positions, respectively.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau".
var:quote  a prov:Entity;
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity;
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

Details about the syntax of bindings can be found in https://provenance.ecs.soton.ac.uk/prov-template/.

2.3 Template Instantiation: A Second Author

In some cases, we would like to express that there is a second author to a document. The attribution template does not need to be redefined. We simply need to provide relevant bindings for the second author.

For instance, Paul and Luc are the two authors of that quote. Conceptually, we want to provide the following bindings.

var:author	http://orcid.org/0000-0002-3494-120X
	http://orcid.org/0000-0003-0183-6910
var:name	“Luc Moreau”
	“Paul Groth”

We see that each of var:author and var:name is given two values. This results in the following expanded provenance graph.

Template Instantiation with Two Authors

The contents of the bindings file is explicit below. Lines 7-9, var:author is given two values, using the properties tmpl:value_0 and tmpl:value_1. Lines 10-12, var:name is given two values to occur in attribute position, with properties tmpl:2dvalue_0_0 and tmpl:2dvalue_1_0.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>;
           tmpl:value_1 <http://orcid.org/0000-0003-0183-6910>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau";
           tmpl:2dvalue_1_0 "Paul Groth".
var:quote  a prov:Entity; 
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity; 
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

Again, we refer the reader to the PROV-Template specification for details of the bindings syntax.

2.4 Template Instantiation: More Attributes

In general, PROV also allows for variable number of attribute values to be provided for a given attribute. For instance, we may want the name and nick name to be provided as two possible values for the var:name variable. This would result in the following expanded graph.

Template Instantiation: Variable Number of Attributes

Template Instantiation with Variable Number of Attributes

Again, the template remains unchanged, but the bindings are as follows. In lines 12-13, we see two possible names for Paul, respectively expressed with tmpl:2dvalue_1_0 and tmpl:2dvalue_1_1. This shows that template expansion can support a variable number of attributes for different statements instantiated from the same template statement.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix tmpl: <http://openprovenance.org/tmpl#> .
@prefix var: <http://openprovenance.org/var#> .
@prefix ex: <http://example.com/#> .

var:author a prov:Entity;
           tmpl:value_0 <http://orcid.org/0000-0002-3494-120X>;
           tmpl:value_1 <http://orcid.org/0000-0003-0183-6910>.
var:name   a prov:Entity;
           tmpl:2dvalue_0_0 "Luc Moreau";
           tmpl:2dvalue_1_0 "Paul Groth";
           tmpl:2dvalue_1_1 "pgroth".
var:quote  a prov:Entity;
           tmpl:value_0 ex:quote1.
var:value  a prov:Entity;
           tmpl:2dvalue_0_0 "A Little Provenance Goes a Long Way".

3. Conclusions

PROV-Template is easy to work with, it just requires provconvert to be installed. By decoupling the generation of provenance from the logging of values, we observed a number of benefits:

It allowed us to fine tune the provenance, independently of the application.
It permitted us to keep the code to generate the provenance separate from the application itself.
It allowed us to adopt a more conceptual approach to provenance, thinking of “provenance schemas” rather than instances.

This is the first part of the tutorial on PROV-Template. In the second part of the tutorial, we will see how PROV-Template can support more sophisticated use cases.

Thanks to co-authors Dong and Danius. Heather has been using it in Smart Society’s SmartShare application.

ProvToolbox Tutorial 3: Merging PROV Documents

Posted by lucmoreau

1. Introduction

It has become a requirement in several of our applications to merge PROV documents. The purpose of this tutorial is to explain how ProvToolbox allows documents to be merged, ensuring that descriptions are uniquely represented with all their attributes, merging bundles they may contain, and optionally “flattening” them.

This functionality is directly available from the command line using provconvert.

The tutorial is standalone and a zip archive can be downloaded from the following URL: http://search.maven.org/remotecontent?filepath=org/openprovenance/prov/ProvToolbox-Tutorial3/0.7.0/ProvToolbox-Tutorial3-0.7.0-src.zip. The tutorial can also be found on the ProvToolbox project on GitHub.

The tutorial assumes that provconvert has been installed and is available in the execution path.
The tutorial relies on a Makefile and can simply be run by calling:

make do.all

2. Examples of Merges

2.1 Merging two documents without bundles

Our first example consists of two documents. The first document “doc1” consists of the attribution of an entity e1 to an agent ag1. The entity has an attribute attr1.

doc1

The second document “doc2” describes the derivation of the same entity e1 from another entity e0. The description of the entity e1 contains an attribute attr2.

doc2

By merging the two documents, we obtain a new document, in which the entity e1 is both attributed to ag1 and derived from e0. The description of e1 contains the attributes attr1 and attr2.

Merged documents doc1 and doc2

Merging the two documents is simply performed by calling provconvert with argument -merge as follows.

provconvert -merge doc1-2-listing.txt -outfile target/doc1-2.provn

The -merge option expects a path to a file (or – to indicate standard input) that lists the files that have to be merged. In our case, we have a file doc1-2-listing.txt with the following contents:

file, src/main/resources/doc1.provn, provn
file, src/main/resources/doc2.provn, provn

Each line consists of three elements separated by a comma:

A tag indicating if we are dealing with a file on the file system or a URL
The path to the file or a full http URL
The PROV format expected to be read

For completeness, we show the details of the documents in PROV-N notation. First, “doc1”:

document

 prefix ex <http://example.org/#>

 entity(ex:e1,[ex:attr1="val1"])
 agent(ex:ag1)
 wasAttributedTo(ex:e1, ex:ag1)

endDocument

Then, “doc2”:

document

 prefix ex <http://example.org/#>

 entity(ex:e1,[ex:attr1="val2"])
 entity(ex:e0) 
 wasDerivedFrom(ex:e1, ex:e0)

endDocument

Finally, the merged document:

document
 prefix ex <http://example.org/#>

 entity(ex:e1,[ex:attr1 = "val1" %% xsd:string, ex:attr1 = "val2" %% xsd:string])
 entity(ex:e0)
 agent(ex:ag1)
 wasDerivedFrom(ex:e1, ex:e0)
 wasAttributedTo(ex:e1, ex:ag1)
endDocument

The merge operation follows key constraints of the PROV-CONSTRAINTS specification, such as key-object (constraint 22) and key-properties (constraint 23).
The reader who is familiar with the RDF representation of PROV will note that the merge operation is simply obtained by “concatening” all the RDF files together.

The merge operation becomes interesting in the presence of bundles.

2.2 Merging two documents with distinct bundles

First, we consider two documents with distinct bundles.

We now examine a variant of “doc1”, which contains a bundle bun1. In the illustration, the bundle is represented by a rectangle, which contains a description of e2 generated by a2.

doc1 with bundle bun1

The second document is a variant of “doc2” with another bundle named bun2. It contains a description of e3 generated by a3.

doc2 with bundle bun2

After merging the two documents, we obtain a new document containing both bun1 and bun2.

doc1 with bundle bun1 merged with doc2 with bundle bun2

As we can see, as the two bundles have different names, they are kept distinct in the merged document.

Concretely, the first document with bundle bun1.

document
 prefix ex <http://example.org/#>

 entity(ex:e1,[ex:attr1="val1"])
 agent(ex:ag1)
 wasAttributedTo(ex:e1, ex:ag1)

 bundle ex:bun1
   entity(ex:e2)
   activity(ex:a2,-,-)
   wasGeneratedBy(ex:e2,ex:a2,-)
 endBundle

endDocument

The first document with bundle bun1.

document
 prefix ex <http://example.org/#>

 entity(ex:e1,[ex:attr1="val2"])
 entity(ex:e0) 
 wasDerivedFrom(ex:e1, ex:e0)

 bundle ex:bun2
   entity(ex:e3)
   activity(ex:a3,-,-)
   wasGeneratedBy(ex:e3,ex:a3,-)
 endBundle

endDocument

The merged documents with two bundles is as follows:

document
 prefix ex <http://example.org/#>

 entity(ex:e1,[ex:attr1 = "val1" %% xsd:string, ex:attr1 = "val2" %% xsd:string])
 entity(ex:e0)
 agent(ex:ag1)
 wasDerivedFrom(ex:e1, ex:e0)
 wasAttributedTo(ex:e1, ex:ag1)

 bundle ex:bun2
  entity(ex:e3)
  activity(ex:a3,-,-)
  wasGeneratedBy(ex:e3,ex:a3,-)
 endBundle

 bundle ex:bun1
  entity(ex:e2)
  activity(ex:a2,-,-)
  wasGeneratedBy(ex:e2,ex:a2,-)
 endBundle
endDocument

2.3 Merging and flattening two documents with distinct bundles

We can optionally use the -flatten option to “remove” bundles, and “pour” their content in the surrounding document.

provconvert -merge doc1b1-2b2-listing.txt -flatten -outfile target/doc1b1-2b2-flatten.provn

The resulting document no longer contains bundles.

Merge and flatten of doc1 with bundle bun1 and doc2 with bundle bun2

document
 prefix ex <http://example.org/#>

 entity(ex:e1,[ex:attr1 = "val1" %% xsd:string, ex:attr1 = "val2" %% xsd:string])
 entity(ex:e0)
 agent(ex:ag1)
 wasDerivedFrom(ex:e1, ex:e0)
 wasAttributedTo(ex:e1, ex:ag1)
 entity(ex:e3)
 activity(ex:a3,-,-)
 wasGeneratedBy(ex:e3,ex:a3,-)
 entity(ex:e2)
 activity(ex:a2,-,-)
 wasGeneratedBy(ex:e2,ex:a2,-)
endDocument

2.4 Merging two documents with the same bundle

Now, let us consider a variant of “doc2” with a bundle bun1, the same identifier as the bundle we had in the “doc1” variant. In the figure, we see that bundle bun1 contains a description of a2 generating e3.

A variant of doc2 with bundle named bun1

If we now merge doc1 with bundle bun1 and doc2 with bundle bun1, the merge procedure merges the descriptions contained in the two instances of bundle bun1. We obtain:

doc1 with bundle bun1 merged with doc2 with bundle bun2

2.5 Merging and flattening two documents with the same bundle

If in addition, we specify the -flatten option, merging and flattening operations result in the following document.

doc1 with bundle bun1 merged with doc2 with bundle bun1, after flattening

3. Conclusion

As our applications generate provenance incrementally, bundles by bundles, the ability to merge documents and collapse bundles has become critical. This functionality is implemented by ProvToolbox in the method IndexedDocument.merge(). This tutorial has shown that it is also directly available from the command line, using the provconvert utility.

What form of processing do you regularly perform on your provenance graphs? Which functionality would you like to see added to ProvToolbox? Tell us, and for any other issue related to ProvToolbox, on the Github issue tracker.

4. Appendix. Log Change

Original version submitted on 2015/07/27

ProvToolbox Tutorial 2: Reading, Converting and Saving PROV Documents

Posted by lucmoreau

1. Introduction

Building on the first ProvToolbox tutorial, the aim of this second tutorial is to show how to read a PROV document using ProvToolbox and export it to some format.

We assume that installation instructions as described in the first Tutorial have been followed. Details about the Maven configuration can also be found there.

2. Download and Execution

The tutorial is standalone and a zip archive can be downloaded from the following URL: http://search.maven.org/remotecontent?filepath=org/openprovenance/prov/ProvToolbox-Tutorial2/0.7.0/ProvToolbox-Tutorial2-0.7.0-src.zip. The tutorial can also be found on the ProvToolbox project on GitHub.

After unziping the archive, we can execute the tutorial, by calling:

mvn clean install

Beside the verbose logging by the Maven build process, the tutorial itself displays the following text, including some PROV expressed according to PROV-XML.

*************************
* Converting document  
*************************

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<prov:document xmlns:prov="http://www.w3.org/ns/prov#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:provbook="http://www.provbook.org" xmlns:jim="http://www.cs.rpi.edu/~hendler/">
    <prov:entity prov:id="provbook:a-little-provenance-goes-a-long-way">
        <prov:value xsi:type="xsd:string">A little provenance goes a long way</prov:value>
    </prov:entity>
    <prov:agent prov:id="provbook:Paul">
        <foaf:name xsi:type="xsd:string">Paul Groth</foaf:name>
    </prov:agent>
    <prov:agent prov:id="provbook:Luc">
        <foaf:name xsi:type="xsd:string">Luc Moreau</foaf:name>
    </prov:agent>
    <prov:wasAttributedTo>
        <prov:entity prov:ref="provbook:a-little-provenance-goes-a-long-way"/>
        <prov:agent prov:ref="provbook:Paul"/>
    </prov:wasAttributedTo>
    <prov:wasAttributedTo>
        <prov:entity prov:ref="provbook:a-little-provenance-goes-a-long-way"/>
        <prov:agent prov:ref="provbook:Luc"/>
    </prov:wasAttributedTo>
    <prov:entity prov:id="jim:LittleSemanticsWeb.html"/>
    <prov:wasDerivedFrom>
        <prov:generatedEntity prov:ref="provbook:a-little-provenance-goes-a-long-way"/>
        <prov:usedEntity prov:ref="jim:LittleSemanticsWeb.html"/>
    </prov:wasDerivedFrom>
</prov:document>

*************************

3. Reading and writing PROV documents in Java

The following Java snippet is extracted from the file src/main/java/org/openprovenance/prov/tutorial/tutorial2/ReadWrite.java. In line 3, it shows how a document can be read, given its path filein on the file system. In line 4, we see how a PROV Document can be saved into a file fileout. The writeDocument procedure determines the PROV format that is required by looking at the extension. If a non-standard extension is used, then the format can be specified explicitly, as in line 5, by one of the values of the enumerated type ProvFormat.

    public void doConversions(String filein, String fileout) {
        InteropFramework intF=new InteropFramework();
        Document document=intF.readDocumentFromFile(filein);
        intF.writeDocument(fileout, document);     
        intF.writeDocument(System.out, ProvFormat.XML, document);
    }

   public static void main(String [] args) {
        if (args.length!=2) throw new UnsupportedOperationException("main to be called with two filenames");
        String filein=args[0];
        String fileout=args[1];
        
        ReadWrite tutorial=new ReadWrite(InteropFramework.newXMLProvFactory());
        tutorial.openingBanner();
        tutorial.doConversions(filein, fileout);
        tutorial.closingBanner();
    }

For completion, line 13 shows how the tutorial class is initialized and line 15 takes care of invoking the conversion functionality.

The tutorial is called from the command line, passing src/main/resources/a-little.provn as the input file, and target/a-little.svg as the output file. Therefore, the a-little.provn file is converted to SVG (by line 4) and to XML on standard output (by line 5).

The following table lists the formats that are supported by ProvToolbox.

gv	text/vnd.graphviz	output
dot	text/vnd.graphviz	output
prov-asn	text/provenance-notation	input
prov-asn	text/provenance-notation	output
pn	text/provenance-notation	input
pn	text/provenance-notation	output
asn	text/provenance-notation	input
asn	text/provenance-notation	output
provn	text/provenance-notation	input
provn	text/provenance-notation	output
rdf	application/rdf+xml	input
rdf	application/rdf+xml	output
json	application/json	input
json	application/json	output
ttl	text/turtle	input
ttl	text/turtle	output
trig	application/trig	input
trig	application/trig	output
jpeg	image/jpeg	output
jpg	image/jpeg	output
provx	application/provenance+xml	input
provx	application/provenance+xml	output
xml	application/provenance+xml	input
xml	application/provenance+xml	output
png	image/png	output
pdf	application/pdf	output
svg	image/svg+xml	output

4. Conclusion

For further documentation on the classes and methods used, Javadoc for ProvToolbox can be found from http://openprovenance.org/java/site/latest/apidocs/. The Javadoc documentation also refers to PROV specifications where appropriate.

Suggestions for tutorials and also for ways of improving the programming experience offered by ProvToolbox are always welcome. Please raise issues on GitHub issue tracker.

5. Appendix. Log Change

Original version submitted on 2015/06/30
Updated to 0.7.0 on 2015/07/27

What is in ProvToolbox 0.6.2?

Posted by lucmoreau

1. Introduction

Today, I have released ProvToolbox 0.6.2 some 11 months after the previous release. This has been a consolidation phase. ProvToolbox is used in various projects and applications, which have exercised its functionality, identified bugs, and raised requirements for new functionality to make it more useful. Concretely, ProvToolbox with its templating system is used in Picaso (contributor: Dong Huynh, Danius Michaelides), SmartSociety‘s SmartShare application (contributor: Heather Packer), eBook‘s blockly-based workflow systems (contributor: Danius Michaelides). ProvToolbox is also used in ProvStore to support the conversion of PROV to various formats.

2. Novel Features

2.1 Document Merge and Flattening

It has become a critical requirement of several of our applications to merge PROV documents. If you think of the RDF representation of PROV, a kind of concatenation of all tuples. For other representations such as PROV-N, PROV-XML, and PROV-JSON, which are more statement oriented, merging documents fuses statements about the same resource — for instance, for an entity, regrouping all attributes in a single statement. When documents contain bundles, these are also merged if they have the same identifier.

Furthermore, we have the option of stripping bundles from documents, as if we were pouring their contents in the document they occur in.

2.2 Standard inputs and outputs for provconvert

With provconvert, we can now use ‘-‘ as a filename to indicate that the input/output will come on standard input or output. This allows provconvert to act much more like a unix tool. However, because provconvert needs to know the format of its input or output (it would previously derive this from the filename extensions) we’ve introduced three extra options -informat, -outformat and -bindformat. These take filename extensions or mime-types as their arguments.

Here we grab a provn document using the curl command line tool, convert it to xml, and show the output:

% curl -s http://www.provbook.org/provapi/documents/bk.provn | provconvert -infile - -informat provn -outfile - -outformat xml

The output as xml is:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<prov:document xmlns:prov="http://www.w3.org/ns/prov#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hendler="http://www.cs.rpi.edu/~hendler/"
xmlns:bk="http://www.provbook.org/is/#"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:images="http://www.provbook.org/imgs/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:provapi="http://www.provbook.org/provapi/documents/"
xmlns:provbook="http://www.provbook.org/">
    <prov:bundleContent prov:id="provbook:provenance">
        <prov:agent prov:id="provbook:Luc">
            <foaf:name xsi:type="xsd:string">Luc Moreau</foaf:name>
        </prov:agent>
        <prov:agent prov:id="provbook:Paul">
            <foaf:name xsi:type="xsd:string">Paul Groth</foaf:name>
        </prov:agent>
        <prov:entity prov:id="provbook:provenance">
...

The format options also override provconvert using filename extensions to derive formats, so we are now less restricted when we name files.

2.3 Supported formats

The option -formats of provconvert provides a list of supported formats. The output is a list of formats one per line, with each line listing filename extension, its associated mime-type and whether the entry is for input or for output.

% provconvert -formats 
gv      text/vnd.graphviz       output 
dot     text/vnd.graphviz       output 
trig    application/trig        input 
trig    application/trig        output 
provn   text/provenance-notation        input 
provn   text/provenance-notation        output 
...

2.4 RPM for provconvert

We now offer an RPM (Red-Hat Package Manager) for binary release. Using the rpm command, one can now install provconvert with:

rpm -U https://repo1.maven.org/maven2/org/openprovenance/prov/toolbox/0.6.2/toolbox-0.6.2-rpm.rpm

2.5 Implementation of prov-template

prov-template is a specification introduced by Dong, Danius and myself that specifies a templating system for PROV. It allows for templates to be defined as PROV documents containing variables. Bindings consist of associations between variables and values. Templates can be expanded by replacing variables by their values specified in binding.

As prov-template is being used in several applications, we realised that parts of the specification has not been fully implemented, and there were some bugs as well. The key changes include proper support for time in activities and in instantaneous events, and a correct implementation of “linked template variables”, allowing the template designer to control the cartesian products, when variables are bound to multiple values.

I am hoping to publish a tutorial on prov-template in the near future.

2.6 Bug Fixes

A few notable bug fixes are listed below.

prov-dot: conversion to dot (and subsequently svg, pdf, etc) escaping characters (issue 103)
prov-json: correct handling of bundle names and namespaces (issue 96)
prov-n: added newline at end of document (issue 112)
prov-template: transitive closure for linked variables (issue 113)

2.7 Tutorial

Two more tutorials have been produced. They are included in the release and I will publish blog posts about them shortly.

reading and converting PROV documents
merging documents

3. Where next?

ProvToolbox was designed to support interoperable conversion of PROV representations. In collaboration with the Software Sustainability Institute, we are developing a test harness that allows us to check inter-operability of various software packages developed in Southampton, including ProvToolbox, ProvStore, ProvPy, ProvTranslator, ProvJS. If we identify inter-operability issues, we will seek to address them in due course.

We have also identified a series of new requirements for prov-template, and possible ways of improving this templating system. We hope to produce the second iteration of this specification and deliver its reference implementation in ProvToolbox.

For all details about ProvToolbox, see the github.io page http://lucmoreau.github.io/ProvToolbox/.

Thanks to Danius, Dong, and Heather for identifying issues or suggesting improvements and implementing them.