What is in ProvToolbox 0.6.2?

1. Introduction

Today, I have released ProvToolbox 0.6.2 some 11 months after the previous release. This has been a consolidation phase. ProvToolbox is used in various projects and applications, which have exercised its functionality, identified bugs, and raised requirements for new functionality to make it more useful. Concretely, ProvToolbox with its templating system is used in Picaso (contributor: Dong Huynh, Danius Michaelides), SmartSociety‘s SmartShare application (contributor: Heather Packer), eBook‘s blockly-based workflow systems (contributor: Danius Michaelides). ProvToolbox is also used in ProvStore to support the conversion of PROV to various formats.

2. Novel Features

2.1 Document Merge and Flattening

It has become a critical requirement of several of our applications to merge PROV documents. If you think of the RDF representation of PROV, a kind of concatenation of all tuples. For other representations such as PROV-N, PROV-XML, and PROV-JSON, which are more statement oriented, merging documents fuses statements about the same resource — for instance, for an entity, regrouping all attributes in a single statement. When documents contain bundles, these are also merged if they have the same identifier.

Furthermore, we have the option of stripping bundles from documents, as if we were pouring their contents in the document they occur in.

2.2 Standard inputs and outputs for provconvert

With provconvert, we can now use ‘-‘ as a filename to indicate that the input/output will come on standard input or output. This allows provconvert to act much more like a unix tool. However, because provconvert needs to know the format of its input or output (it would previously derive this from the filename extensions) we’ve introduced three extra options  -informat, -outformat and -bindformat. These take filename extensions or mime-types as their arguments.

Here we grab a provn document using the curl command line tool, convert it to xml, and show the output:

% curl -s http://www.provbook.org/provapi/documents/bk.provn | provconvert -infile - -informat provn -outfile - -outformat xml

The output as xml is:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<prov:document xmlns:prov="http://www.w3.org/ns/prov#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:hendler="http://www.cs.rpi.edu/~hendler/"
xmlns:bk="http://www.provbook.org/is/#"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:images="http://www.provbook.org/imgs/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:provapi="http://www.provbook.org/provapi/documents/"
xmlns:provbook="http://www.provbook.org/">
    <prov:bundleContent prov:id="provbook:provenance">
        <prov:agent prov:id="provbook:Luc">
            <foaf:name xsi:type="xsd:string">Luc Moreau</foaf:name>
        </prov:agent>
        <prov:agent prov:id="provbook:Paul">
            <foaf:name xsi:type="xsd:string">Paul Groth</foaf:name>
        </prov:agent>
        <prov:entity prov:id="provbook:provenance">
... 

The format options also override provconvert using filename extensions to  derive formats, so we are now less restricted when we name files.

2.3 Supported formats

The option -formats of provconvert provides a list of supported formats. The output is a list of formats one per line, with each line listing filename extension, its associated mime-type and whether the entry is for input or for output.

% provconvert -formats 
gv      text/vnd.graphviz       output 
dot     text/vnd.graphviz       output 
trig    application/trig        input 
trig    application/trig        output 
provn   text/provenance-notation        input 
provn   text/provenance-notation        output 
...

2.4 RPM for provconvert

We now offer an RPM (Red-Hat Package Manager) for binary release. Using the rpm command, one can now install provconvert with:

rpm -U https://repo1.maven.org/maven2/org/openprovenance/prov/toolbox/0.6.2/toolbox-0.6.2-rpm.rpm

2.5 Implementation of prov-template

prov-template is a specification introduced by Dong, Danius and myself that specifies a templating system for PROV. It allows for templates to be defined as PROV documents containing variables. Bindings consist of associations between variables and values. Templates can be expanded by replacing variables by their values specified in binding.

As prov-template is being used in several applications, we realised that parts of the specification has not been fully implemented, and there were some bugs as well. The key changes include proper support for time in activities and in instantaneous events, and a correct implementation of “linked template variables”, allowing the template designer to control the cartesian products, when variables are bound to multiple values.

I am hoping to publish a tutorial on prov-template in the near future.

2.6 Bug Fixes

A few notable bug fixes are listed below.

  • prov-dot: conversion to dot (and subsequently svg, pdf, etc) escaping characters (issue 103)
  • prov-json: correct handling of bundle names and namespaces (issue 96)
  • prov-n: added newline at end of document (issue 112)
  • prov-template: transitive closure for linked variables (issue 113)

2.7 Tutorial

Two more tutorials have been produced. They are included in the release and I will publish blog posts about them shortly.

  • reading and converting PROV documents
  • merging documents

3. Where next?

ProvToolbox was designed to support interoperable conversion of PROV representations. In collaboration with the Software Sustainability Institute, we are developing a test harness that allows us to check inter-operability of various software packages developed in Southampton, including ProvToolbox, ProvStore, ProvPy, ProvTranslator, ProvJS. If we identify inter-operability issues, we will seek to address them in due course.

We have also identified a series of new requirements for prov-template, and possible ways of improving this templating system. We hope to produce the second iteration of this specification and deliver its reference implementation in ProvToolbox.

For all details about ProvToolbox, see the github.io page http://lucmoreau.github.io/ProvToolbox/.

Thanks to Danius, Dong, and Heather for identifying issues or suggesting improvements and implementing them.

Advertisements

3 thoughts on “What is in ProvToolbox 0.6.2?

  1. Many thanks for this update. The new prov-template feature looks really interesting and I think it could be useful to escape reserved characters in serialized PROV documents. For example, I was thinking in escaping the dollar sign ($) in prov-json documents, in order to store these documents with mongoDB, where $ is a reserved character. I would like to know more details about prov-template, in especial how to use it to customize the output of the interoperability framework.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s