Today, I have released ProvToolbox 0.6.2 some 11 months after the previous release. This has been a consolidation phase. ProvToolbox is used in various projects and applications, which have exercised its functionality, identified bugs, and raised requirements for new functionality to make it more useful. Concretely, ProvToolbox with its templating system is used in Picaso (contributor: Dong Huynh, Danius Michaelides), SmartSociety‘s SmartShare application (contributor: Heather Packer), eBook‘s blockly-based workflow systems (contributor: Danius Michaelides). ProvToolbox is also used in ProvStore to support the conversion of PROV to various formats.
2. Novel Features
2.1 Document Merge and Flattening
It has become a critical requirement of several of our applications to merge PROV documents. If you think of the RDF representation of PROV, a kind of concatenation of all tuples. For other representations such as PROV-N, PROV-XML, and PROV-JSON, which are more statement oriented, merging documents fuses statements about the same resource — for instance, for an entity, regrouping all attributes in a single statement. When documents contain bundles, these are also merged if they have the same identifier.
Furthermore, we have the option of stripping bundles from documents, as if we were pouring their contents in the document they occur in.
2.2 Standard inputs and outputs for provconvert
With provconvert, we can now use ‘-‘ as a filename to indicate that the input/output will come on standard input or output. This allows provconvert to act much more like a unix tool. However, because provconvert needs to know the format of its input or output (it would previously derive this from the filename extensions) we’ve introduced three extra options -informat, -outformat and -bindformat. These take filename extensions or mime-types as their arguments.
Here we grab a provn document using the curl command line tool, convert it to xml, and show the output:
% curl -s http://www.provbook.org/provapi/documents/bk.provn | provconvert -infile - -informat provn -outfile - -outformat xml
The output as xml is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <prov:document xmlns:prov="http://www.w3.org/ns/prov#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hendler="http://www.cs.rpi.edu/~hendler/" xmlns:bk="http://www.provbook.org/is/#" xmlns:dct="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:images="http://www.provbook.org/imgs/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:provapi="http://www.provbook.org/provapi/documents/" xmlns:provbook="http://www.provbook.org/"> <prov:bundleContent prov:id="provbook:provenance"> <prov:agent prov:id="provbook:Luc"> <foaf:name xsi:type="xsd:string">Luc Moreau</foaf:name> </prov:agent> <prov:agent prov:id="provbook:Paul"> <foaf:name xsi:type="xsd:string">Paul Groth</foaf:name> </prov:agent> <prov:entity prov:id="provbook:provenance"> ...
The format options also override provconvert using filename extensions to derive formats, so we are now less restricted when we name files.
2.3 Supported formats
-formats of provconvert provides a list of supported formats. The output is a list of formats one per line, with each line listing filename extension, its associated mime-type and whether the entry is for input or for output.
% provconvert -formats gv text/vnd.graphviz output dot text/vnd.graphviz output trig application/trig input trig application/trig output provn text/provenance-notation input provn text/provenance-notation output ...
2.4 RPM for provconvert
We now offer an RPM (Red-Hat Package Manager) for binary release. Using the
rpm command, one can now install provconvert with:
rpm -U https://repo1.maven.org/maven2/org/openprovenance/prov/toolbox/0.6.2/toolbox-0.6.2-rpm.rpm
2.5 Implementation of prov-template
prov-template is a specification introduced by Dong, Danius and myself that specifies a templating system for PROV. It allows for templates to be defined as PROV documents containing variables. Bindings consist of associations between variables and values. Templates can be expanded by replacing variables by their values specified in binding.
As prov-template is being used in several applications, we realised that parts of the specification has not been fully implemented, and there were some bugs as well. The key changes include proper support for time in activities and in instantaneous events, and a correct implementation of “linked template variables”, allowing the template designer to control the cartesian products, when variables are bound to multiple values.
I am hoping to publish a tutorial on prov-template in the near future.
2.6 Bug Fixes
A few notable bug fixes are listed below.
- prov-dot: conversion to dot (and subsequently svg, pdf, etc) escaping characters (issue 103)
- prov-json: correct handling of bundle names and namespaces (issue 96)
- prov-n: added newline at end of document (issue 112)
- prov-template: transitive closure for linked variables (issue 113)
Two more tutorials have been produced. They are included in the release and I will publish blog posts about them shortly.
- reading and converting PROV documents
- merging documents
3. Where next?
ProvToolbox was designed to support interoperable conversion of PROV representations. In collaboration with the Software Sustainability Institute, we are developing a test harness that allows us to check inter-operability of various software packages developed in Southampton, including ProvToolbox, ProvStore, ProvPy, ProvTranslator, ProvJS. If we identify inter-operability issues, we will seek to address them in due course.
We have also identified a series of new requirements for prov-template, and possible ways of improving this templating system. We hope to produce the second iteration of this specification and deliver its reference implementation in ProvToolbox.
For all details about ProvToolbox, see the github.io page http://lucmoreau.github.io/ProvToolbox/.
Thanks to Danius, Dong, and Heather for identifying issues or suggesting improvements and implementing them.