What is in ProvToolbox 0.7.0?

1. Introduction

Today, I have released ProvToolbox 0.7.0. It has again been a consolidation phase, seeking to ensure better compliance, better inter-operability, better robustness, and better internal organization.

2. Novel Features

2.1 Error Codes

As provconcert is being used more frequently as part of more complex workflows, it became critical to return error codes to indicate when there is a problem. Here is an illustration of how this can be used. The first invocation of provconvert is not providing an argument to the option -infile; so the error code is non zero. The second invocation is running without any problem; so the error code is zero.

> provconvert -infile
14:38:27,301 FATAL CommandLineArguments:297 - Parsing failed.  Reason: no argument for:infile
> echo $?
1

> provconvert -help
...
> echo $?
0

Error codes are defined in the interface org.openprovenance.prov.interop.ErrorCodes.

2.2 Document Comparison

To support the inter-operability harness, we needed a program capable of deciding whether two documents were serializations of the same PROV document. Such functionality already existed in ProvToolbox and was extensively used in our testing. It has now been exposed in provconvert.

An illustration of this functionality is as follows.

provconvert -infile file1.provn -compare file2.ttl -outcompare diff.txt

An error code (STATUS_COMPARE_DIFFERENT) is returned when the two files contain serializations of different PROV Documents.

2.3 Better Logging of Various Warnings

Log4j is the logging infrastructure used by ProvToolbox. We have refactored the code to ensure that some warnings, such as those generated by the PROV-N parser, are logged with Log4j. The intent is that all messages get logged through this infrastructure.

This naturally raises a question as to what we should do when the PROV-N parser finds PROV-N statements that are not compatible with the grammar, recovers and continues parsing; or when, a PROV Qualified Name is constructed with an incorrect syntax.

On the one hand, a permissive approach is good because it allows ProvToolbox to deal with a wide variety of inputs; on the other hand, there may be cases when we want to be strict. Is a strict mode a desirable feature? Your input would be desirable: let me know what your use cases are, and we will try to support them.

2.4 Syntax of Qualified Names in PROV-N

PROV-N has a production typedLiteral to encode all typed literals, consisting of a STRING_LITERAL for the external representation of the literal, and a datatype, expressed as a Qualified Name, for its type.

typedLiteral ::= STRING_LITERAL "%%" datatype

For instance, "1" %% xsd:integer represents the integer value 1. (In this case, PROV-N also supports the more simple convenience notation 1.)

A Qualified Name is expressed as follows (cf. Example 38).

  "ex:value" %% prov:QUALIFIED_NAME

A convenience notation is also permitted, in the form of 'ex:value'.

ProvToolbox, before 0.7.0, was only supporting the convenience notation for Qualified Names. It now supports both forms in compliance with the specification.

2.5 Default xsd Namespace

PROV-DM, PROV-N, PROV-O all use http://www.w3.org/2000/10/XMLSchema# as the XML Schema Namespace URI. Since 0.7.0, this namespace URI has also become the default for XML Schema in ProvToolbox. (See NamespacePrefixMapper.html#XSD_NS.)

In previous versions, as ProvToolbox had a strong JAXB heritage, the default namespace URI for XML Schema was the “XML version” http://www.w3.org/2000/10/XMLSchema (note the lack of hash at the end). We moved away from this namespace URI since we want the programmer to manipulate the namespace URI used in the Recommendation. The “XML Version” is only required when marshalling to/unmarshalling from XML.

2.6 Syntax of QualifiedName

PROV-DM defines a PROV Identifier as a Qualified Name, which is a name subject to namespace interpretation. It consists of a namespace, denoted by an optional prefix, and a local name. PROV-N provides a concrete syntax for prov:QUALIFIED_NAME, further explaining how a PROV-N qualified name can be mapped to a valid IRI.

However, PROV-N provides a concrete syntax for prov:QUALIFIED_NAME, further noting that a PROV-N qualified name QUALIFIED_NAME can be mapped to a valid IRI [RFC3987] by concatenating the namespace denoted its local name to the local name, whose -escaped characters have been unescaped by dropping the character ‘\’ (backslash).

Before 0.7.0, ProvToolbox was not implementing fully the syntax of prov:QUALIFIED_NAME, since it ignored escape characters, and how they should be handled when forming a URI.

A consequence of this was that some URIs read from a ttl representation were not represented properly as Qualified Names in the toolbox, and were not converted back to their original from when exporting back to ttl.

All this is now addressed in ProvToolbox 0.7.0 with the possibility of forcing syntactic checks when creating Qualified Names. Full details are available from https://github.com/lucmoreau/ProvToolbox/wiki/Syntax-of-prov:QUALIFIED_NAME. A consequence of releasing ProvToolbox 0.7.0 with support for this syntax is that PROV-N documents previously generated may not be readable if they don’t already support this encoding.

2.7 Syntax of QName

PROV-XML mandates xsd:QName as the XSD datatype to be used for qualified names. However, the xsd:QName datatype is more restrictive than the QualifiedName defined in PROV-N, e.g. PROV-N allows local names to start with numbers, whereas xsd:QName does not. PROV-XML does not specify how to convert an arbitrary PROV Qualified Name into xsd:QName.

ProvToolbox now offers such a conversion function and also a method to check whether a xsd:QName is syntactically correct. Details of the encoding have been documented at https://github.com/lucmoreau/ProvToolbox/wiki/Mapping-PROV-Qualified-Names-to-xsd:QName.

Before ProvToolbox 0.7.0, the module to convert to prov-xml was simply ignoring the required syntax of xsd:QName and was generating xsd:QNames that were not syntactically valid. It was not acceptable.

A consequence of releasing ProvToolbox 0.7.0 with support for this encoding is that PROV-XML documents previously generated may not be readable if they don’t already support this encoding.

2.9 Internationalization Testing

Some testing of Unicode characters was introduced to ensure that multiple languages were supported in string representations, but also in qualified names.

2.10 Various Bug fixes

  • prov:InternationalizedString issue #133
  • incorrect prefix declaration in export issue #132
  • parsing relative uris with input stream in ttl issue #122
  • prov-n qualified names written as “ex:foo” %% prov:QUALIFIED_NAME issue #109
  • warning for prov:label non-string value issue #104
  • escaping of characters in Qualifed Names and QNames issue #120
  • visualisation of prov:value issue #71
  • conversion to dot issue #67

3. Conclusion

I am keen to know who is using ProvToolbox and/or provconvert and for for which purpose. Share details of your projects with me, I will add them to https://github.com/lucmoreau/ProvToolbox/wiki/Projects-and-Applications-Using-ProvToolbox.

ProvToolbox will now be integrated in the inter-operability harness developed in collaboration with Software Sustainability Institute.This test harness will allow us to check inter-operability of various software packages developed in Southampton, including ProvToolbox, ProvStore, ProvPy, ProvTranslator, ProvJS. If we identify inter-operability issues, we will seek to address them in due course.

For all details about ProvToolbox, see the github.io page http://lucmoreau.github.io/ProvToolbox/.

Thanks to Danius, Dong, and Heather for identifying issues or suggesting improvements and implementing them.

Leave a comment