Provenance Recipe: Mapping Shortcuts to PROV

1. The context

In our book “Provenance: Introduction to Prov”, Paul and I introduced provenance recipes, describing commonly encountered occurrences of provenance. In this blog post, I introduce a new recipe, which is concerned with mapping “shortcuts” to PROV.

In a linked data/semantic web context, it is convenient to introduce new relations as “shortcuts” to relate entities to their context.

For instance, a cake may have been baked by the baker, but was wrapped by the shop assistant and delivered by the delivery person. We can express this description using three triples.

ex:cake ex:bakedBy ex:baker;
        ex:wrappedBy ex:assistant;
        ex:deliveredBy ex:deliveryPerson.

In the context of the Web Annotation Working group, an annotation may be annotated by an agent and serialized by another agent. Likewise, two triples are required.

ex:anno1 oa:annotatedBy ex:agent1;
         oa:serializedBy ex:agent2.

2. A provenance view

For those who care about provenance, the cake that was baked by the baker was not wrapped yet; it was then wrapped by the assistant, leading to a wrapped cake, and then delivered to its destination, by the delivery person. So, the unwrapped cake, the wrapped cake, and the delivered cake are different entities. Using the PROV vocabulary, we can also say that the wrapped cake was derived from the unwrapped cake, and the delivered cake was itself derived from the wrapped cake.

We could write such a detailed description as follows.

ex:unwrappedCake ex:bakedBy ex:baker.
ex:wrappedCake prov:wasDerivedFrom ex:unwrappedCake.
ex:wrappedCake ex:wrappedBy ex:assistant.
ex:deliveredCake prov:wasDerivedFrom ex:wrappedCake.
ex:deliveredCake ex:deliveredBy ex:deliveryPerson.

We could even make the activities of baking, wrapping, and delivering explicit.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example/> .

ex:unwrappedCake a prov:Entity.
ex:wrappedCake a prov:Entity.
ex:deliveredCake a prov:Entity.

ex:baker a prov:Agent.
ex:assistant a prov:Agent.
ex:deliveryPerson a prov:Agent.

ex:baking a prov:Activity.
ex:unwrappedCake ex:bakedBy ex:baker;
                 prov:wasAttributedTo ex:baker.
ex:unwrappedCake prov:wasGeneratedBy ex:baking.
ex:baking prov:wasAssociatedWith ex:baker.

ex:wrapping a prov:Activity.
ex:wrappedCake prov:wasDerivedFrom ex:unwrappedCake.
ex:wrappedCake ex:wrappedBy ex:assistant;
               prov:wasAttributedTo ex:assistant.
ex:wrappedCake prov:wasGeneratedBy ex:wrapping.
ex:wrapping prov:wasAssociatedWith ex:assistant;
            prov:used ex:unwrappedCake.

ex:delivering a prov:Activity.
ex:deliveredCake prov:wasDerivedFrom ex:wrappedCake.
ex:deliveredCake ex:deliveredBy ex:deliveryPerson;
                 prov:wasAttributedTo ex:deliveryPerson.
ex:deliveredCake prov:wasGeneratedBy ex:delivering.
ex:delivering prov:wasAssociatedWith ex:deliveryPerson;
              prov:used ex:wrappedCake.

Graphically, it is represented as follows:

baker

Baking, wrapping and delivering of a cake (illustrated using the PROV graphical notation)

Provenance experts could even say that all these entities ex:wrappedCake, ex:wrappedCake, ex:deliveredCake are all specialization of a cake ex:cake in general, but I don’t discuss this aspect of the modelling any further here.

So we can see that the shortcuts ex:bakedBy, ex:wrappedBy, and ex:deliveredBy are hiding a lot of details relevant to provenance. But not everybody is interested in such a level of details.

Going back to the example of an annotation and its serialization, we would expand it in full, as follows, making explicit all entities and activities, and associated relationships.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example/> .
@prefix oa: <http://www.w3.org/ns/oa#>.

ex:anno1 a prov:Entity.
ex:annoDocument1 a prov:Entity.


ex:agent1 a prov:Agent.
ex:agent2 a prov:Agent.

ex:annotating a prov:Activity.
ex:anno1 oa:annotatedBy ex:agent1;
         prov:wasAttributedTo ex:agent1.
ex:anno1 prov:wasGeneratedBy ex:annotating.
ex:annotating prov:wasAssociatedWith ex:agent1.

ex:serializing a prov:Activity.
ex:annoDocument1 prov:wasDerivedFrom ex:anno1.
ex:annoDocument1 oa:serializedBy ex:agent2;
                 prov:wasAttributedTo ex:agent2.
ex:annoDocument1 prov:wasGeneratedBy ex:serializing.
ex:serializing prov:wasAssociatedWith ex:agent2;
               prov:used ex:anno1.

Graphically, it is represented as follows.

graph431233691196396438

Creating and serializing an annotation (illustrated using the PROV graphical notation)

 

Ontology and system designers like the shortcuts since they capture the intuition and are concise. However, compatibility with the provenance standard allows provenance-specific tools to inter-operate. The Web Annotation Working Group decided to adopt the shortcut notation, but to ensure compatibility with PROV, it also offers a mapping for the shortcuts to PROV.

3. The Problem

The Web Annotation Working group has just introduced a notion of annotation (oa:Annotation) without making the distinction between the annotation and its serialized version. The shortcuts oa:annotatedBy and oa:serializedBy allow annotations to be attributed to their creator and serializer. This is the kind of provenance that the Web Annotation vocabulary supports. To reconcile it with PROV, a mapping to PROV concepts is proposed. The mapping is described in Appendix D. For convenience, I include Fig 32 below:

Prov Mapping (from Open Annotation model FPWD)

While the Web Annotation vocabulary only introduces a single notion of annotation, the mapping distinguishes this entity from its serialisation, respectively noted anno1 and annoDocument1 in the figure. These two entities are related by an edge prov:wasDerivedFrom. As oa:serializedBy and oa:annotatedBy capture attribution in Web Annotation, therefore the mapping defines these as subproperties of prov:wasAttributedTo. Thus, we obtain the following PROV visualisation.

@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example/> .
@prefix oa: <http://www.w3.org/ns/oa#>.

ex:anno1 a prov:Entity.
ex:annoDocument1 a prov:Entity.

ex:agent1 a prov:Agent.
ex:agent2 a prov:Agent.

ex:annotating a prov:Activity.
ex:anno1 oa:annotatedBy ex:agent1;
         prov:wasAttributedTo ex:agent1.
ex:anno1 prov:wasGeneratedBy ex:annotating.
ex:annotating prov:wasAssociatedWith ex:agent1.

ex:annoDocument1 prov:wasDerivedFrom ex:anno1.

ex:anno1 oa:serializedBy ex:agent2;
         prov:wasAttributedTo ex:agent2.

ex:serializing a prov:Activity.
ex:annoDocument1 prov:wasGeneratedBy ex:serializing.
ex:serializing prov:wasAssociatedWith ex:agent2;
               prov:used ex:anno1.

Graphically, it is represented as follows.

Original Mapping of OA to PROV

Original Mapping of OA to PROV (using the PROV graphical notation)

In this figure, we see that there is a key difference with the proposed mapping and Figure above. It is ex:anno1 that is attributed to ex:agent2 and not ex:annoDocument1.

What is the consequence of this mapping to provenance? In short, it leads to an incorrect modelling, which we describe below. This description involves some inferences supported by PROV. In the next section, I look at a solution.

By PROV-Constraints attribution-inference 13, if oa:serializedBy is a subProperty of prov:wasAttributedTo and ex:anno1 oa:serializedBy ex:agent2, then, for some activity ex:act2, the following holds:

ex:anno1 prov:wasGeneratedBy ex:act2 
ex:act2 prov:wasAssociatedWith ex:agent2

Again, by attribution-inference 13, if oa:annotatedBy is subProperty of prov:wasAttributedTo and ex:anno1 oa:annotatedBy ex:agent1, then, for some activity ex:act1:

ex:anno1 prov:wasGeneratedBy ex:act1 
ex:act1 prov:wasAssociatedWith ex:agent1

By Constraint 9 generation-generation-ordering, if ex:anno1 prov:wasGeneratedBy ex:act1 (let’s call it generation gen1) and ex:anno1 prov:wasGeneratedBy ex:act2 (let’s call it generation gen2), then gen1 and gen2 occur simultaneously.

But this is not the intent that ex:agent2 is associated with an activity that generates ex:anno1. It is in fact ex:serializing that ex:agent2 is associated with, which generated ex:annoDocument1.

This is not exactly a logical inconsistency, but it is an inconsistency with what was intended to be modelled.

4. Open Annotation Mapping to PROV revised

In my review of the first public working draft, I outlined a solution that is avoiding this inconsistency.

First, in the mapping, we need to express the triple ex:annoDocument1 prov:wasAttributedTo ex:agent2 to make it explicit this is entity ex:annoDocument1 that it attributed to ex:agent2. The original mapping contained a triple ex:annoDocument1 prov:wasDerivedFrom ex:anno1. We refine this assertion by introducing a triple ex:anno1 oa:serializedInto ex:annoDocument1, which states that ex:anno1 was serialized into ex:annoDocument1. The new property oa:serializedInto is subrelation of prov:hadDerivation, which is inverse of prov:wasDerivedFrom. Both triples are represented in red, in the following image.

Updated mapping to Provenance

 

We can see that

ex:anno1 oa:serializedBy ex:agent2

is a shortcut for the chain:

ex:anno1 oa:serializedInto ex:annoDocument1
ex:annoDocument1 prov:wasAttributedTo ex:agent2

Therefore, we could see oa:serializedBy as a superproperty of property chain oa:serializedInto followed by prov:wasAttributedTo.

For OWL afficionados, we would write this

oa:serializedInto rdf:type owl:ObjectProperty ;
                  rdfs:domain oa:Annotation ;
                  rdfs:range oa:Annotation ;
                  rdfs:subPropertyOf prov:hadDerivation .

oa:serializedBy owl:propertyChainAxiom ( oa:serializedInto
                                         prov:wasAttributedTo) .

Likewise, oa:serializedAt is superproperty of property chain oa:serializedInto followed by prov:wasGeneratedAt.

oa:serializedAt owl:propertyChainAxiom ( oa:serializedInto
                                         prov:wasGeneratedAt) .

5. Mapping for Annotation shortcuts

So, the Web Annotation provenance shortcuts in the following example,

<ex:anno1> a oa:Annotation ;
    oa:annotatedBy <ex:agent1> ;
    oa:annotatedAt "2013-01-28T12:00:00Z" ;
    oa:serializedBy <ex:agent2> ;
    oa:serializedAt "2013-02-04T12:00:00Z" .

can be translated:

<ex:anno1> a oa:Annotation ;
    prov:wasAttributedTo <ex:agent1> ;
    prov:wasGeneratedAt "2013-01-28T12:00:00Z" ;
    oa:serializedInto [prov:wasAttributedTo <ex:agent2> ;
                       prov:wasGeneratedBy [ a prov:Activity;
                                             prov:wasAssociatedWith <ex:agent2>];
                       prov:wasGeneratedAt "2013-02-04T12:00:00Z"].

We introduced a blank node to represent the serialized annotation.

6. Mapping for Baking Example

We can apply the same mapping to the baking cake application.

ex:cake ex:wrappedBy ex:assistant. maps to the following statements. It’s not the cake itself that is attributed to the assistant but its wrapped version, which was derived from the unwrapped version. For this, we have also introduced a property ex:wasWrapepdInto which is subproperty of the inverse of prov:wasDerivedFrom.

ex:cake ex:wasWrappedInto [ prov:wasAttributedTo ex:assistant]

In this mapping, we use a blank node for the wrapped cake.

Now, combined with ex:cake ex:deliveredBy ex:deliveryPerson. , the mapping generates:

ex:cake ex:wasWrappedInto 
     [ prov:wasAttributedTo ex:assistant;
       ex:deliveredToDestination [ prov:wasAttributedTo ex:deliveryPerson ]]

We see here that a second blank node is introduced to represent the delivered cake, which is attributed to the delivery person. Again, we rely on a domain specific property ex:deliveredToDestination, which is expected to be a subproperty of the inverse of prov:wasDerivedFrom.

We can further decorate these with time information:

ex:cake ex:wasWrappedInto 
     [ prov:wasAttributedTo ex:assistant;
       prov:wasGeneratedAt "2015-03-23T12:00:00Z";
       ex:deliveredToDestination 
              [ prov:wasAttributedTo ex:deliveryPerson;
                prov:wasGeneratedAt "2015-03-24T12:00:00Z"]]

We see that the cake was derived a day after being wrapped.

7. Conclusion

Overall, this mapping is quite elegant. It allows concise provenance shortcuts to be used, while aligning with the PROV standard. The implementation as a SPARQL CONSTRUCT statement should be straightforward.

Thanks to Paolo Ciccarese and Ivan Herman for their feedback.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s