Data privacy and accountability

My statement for todays’ panel on privacy. For today’s panel, I want to talk about data privacy in the context of the notion of accountability.

Imagine you browse the web, looking for shoes. For the weeks to follow, whenever you visit a web page, adverts of shoes will be presented to you.

Have you ever asked yourself why these adverts are shown to you, who has information about you, what information they have about you, and how did they decide to serve this advert to you?

A system able to answer such why/who/what/how questions is accountable. Being accountable means being able to provide explanations or justifications for decisions and actions.

To be able to provide accountability, there is a need to be able to trace flows of data (traceability), tracing data across systems enables explanations to be provided about the transformations, operations, and decisions made about such data.  Several names are available for such notion, traceability or provenance.  Provenance of a decision helps explaining factors that affected the decision, data involved in it, etc. The word is common for food: provenance of food is a sign of its quality; likewise, provenance of a piece of art enables its authenticity to be asserted. Over the last 15 years, I have been leading research activities around provenance of data, and led a standardisation activity for provenance on the web.

The European GDPR General data protection regulation coming in 2018 has a component dubbed the “right to explanation”. There are still some uncertainty about what it entails both legally and technically.

What has it got to do with privacy? Privacy and accountability have an interesting relation that I want to discuss.

Consider expense claims, a topic well understood by this audience. Imagine that Alice and Bob have a business meeting conducted over a meal. Bob has to make his expense claim public.  This may indirectly make the presence of Alice at the restaurant’s location public. Alice’s privacy is in tension with Bob’s accountability/transparency requirement.

So, there is a tension between privacy and accountability. 100% private doesn’t give you accountability, 100% accountable doesn’t give you privacy.

Privacy is important, so is accountability! These are values that we want to promote Technically, legally and as a society, we are still learning to understand these values and how they should be protected.






Principles for Algorithmic Transparency and Accountability: A Provenance Perspective

A few days ago, the ACM U.S. Public Policy Council (USACM) released a statement and a list of seven principles aimed at addressing potential harmful bias of algorithmic solutions. This effort was initiated by the USACM’s Algorithmic Accountability Working Group.  Algorithmic solutions are now widely deployed to make decisions that affect our lives, e.g., recommendations for movies, targeted ads on the web, autonomous vehicles, suggested contacts or reading in social networks, etc.  We have all come across systems making decisions that are targeted to us individually, and I am sure that many of us have wondered  how a given recommendation was made to us, on the basis of which information and what kind of profile. Typically, no explanation is made available to us!  Nor there is any means to track the origin of such decisions!

Interestingly, emerging regulatory frameworks, such as the EU General Data Protection Regulation, are introducing the “right to explanations” (see in particular related to Article 22 on Automated individual decision-making, including profiling. So, the regulatory framework is evolving, even though there is still no consensus on how to actually achieve this in practice.

Furthermore, algorithmic bias is a phenomenon that has been observed in various contexts (see for instance two recent articles of the  New-York Times and the Guardian). Given their pervasive nature, ACM U.S. Public Policy Council acknowledges that it is imperative to address “challenges associated with the design and technical aspects of algorithms and preventing bias from the onset”.   On this basis, they propose 7 principles, compatible with their code of ethics.

As a provenance researcher, I have always regarded the need to log flows of information and activities, and ascribe responsibility for these as crucial steps to making systems accountable. This view was echoed by Danny Weitzner and team in their seminal paper on Information Accountability.  I was therefore delighted to see that “Data provenance” was listed as an explicit principle of the USACM list of seven principles. So, instead of paraphrasing them, I take the liberty of copying them below.


Figure 1: ACM US Public Policy Council list of seven principles for Algorithmic Transparency and Accountability



However, I feel that provenance, as understand it, encompasses several of these principles, something that I propose to investigate in the rest of this post.  To illustrate this, I propose Figure 2, a block diagram outlining the high-level architecture of a transparent and accountable system.  At the heart of such a system, we find its Business Logic which provides its primary functionality (e.g. Recommendations, Analytics, etc).  In provenance-aware systems, applications log their activities and data flows, out of which a semantic representation is constructed, which I refer to as provenance. PROV is a standardised representation for provenance, which was recently published by the World Wide Web consortium and seeing strong adoption in various walks of life.  In this context, provenance is defined as “a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing”.

There is no point constructing such a semantic representation, if it is not being exploited. Various capabilities can be built on top of such a provenance repository, including query interfaces, audit functionality, explanation service, redress mechanism and validation, which we discuss now in light of the seven principles.


The Role of Provenance in the Architecture of an Accountable System

Figure 2: The Role of Provenance in the Architecture of an Accountable System


The first principle (Awareness) identifies a variety of stakeholders: Owners, Designers and Builders, Users, but the second principle also mentions the role of Regulators, and we believe that potential third-party Auditors are also relevant in that context.  While technology makes progress with algorithmic solutions, society is much slower to react, and there is indeed work required to increase awareness, and establish what the user rights are, and what the obligations on owners should be, whether by means of regulations or self-regulations. The SmartSociety project recently published a Social Charter for Smart Platforms, which is an illustration of what rights and obligations can be in “smart” platforms. 

The second principle (Access and Redress) recommends mechanisms by which systems can be questioned and redress enabled for individuals.This principle points to the ability to query the system and its past actions, which is a typical provenance-based functionality. For those seeking redress, there is a need to be able to refer to an event that resulted in an unsatisfactory outcome; PROV-based provenance mandates that all outcome, data and activity instances are uniquely identified.  Furthermore, we are of the view that such a redress mechanism, including reached resolutions, should be inspectable in a similar fashion; thus, provenance of redress requests and resolutions should also be inspectable.

The third principle (Accountability) is concerned with holding institutions responsible for the decisions made by their algorithmic systems.   For this, one needs a non-repudiable account of what has happened, and suitable attribution of decisions to system components, their owners, and those legally responsible for the system’s actions. Again, such an account is exactly what PROV offers: therefore we see the third principle being implemented technically with queries over provenance representation, and socially with suitable regulatory and enforcement mechanisms.

The fourth principle (Explanation) requires explanations to be produced about the unfolding of activities and decisions.  There is emerging evidence that provenance can serve as a form of computer-based narrative, out of which textual explanations can be composed and presented to users.  We recently conducted some user studies about the perceived legibility of natural language explanations by casual users.  We also used a similar technique in order to provide explanations about user ratings in a Ride Share application.

The fifth principle (Data Provenance) is explicitly focusing on training data used to train so-called “machine-learning” algorithms. We believe that it is not just training data that is relevant, but any external data, the business logic and designers may rely upon. It is expected that public scrutiny of such data offers opportunity to correct potential bias, and in general, any concern that may affect decisions. To operationalize this principle, one needs to have access to a description of the data (potentially, the data itself), but also how it is used in training algorithms, and how this potentially affects decisions. PROV-based Provenance, queries and explanations are required here to allow such scrutiny. Some of our recent work focused on analytics techniques to assess the quality of data, using provenance information; such a mechanism becomes useful to ensure some form of quality control in systems.

The sixth principle (Auditability) demands models, algorithms, data, and decision to be recorded, so that they can be audited. All these can easily be described in PROV, by means of “PROV entities“, which can be used or generated by “PROV activities“, under the supervision of responsible agents. Specific auditing functions (aimed at various stakeholders) can query the provenance to expose individual entities, but also their aggregate characteristics, over longer periods of time. Techniques that we have developed, such as provenance summarisation, become really critical in this context, since they enable us to investigate aggregate behaviour of applications, instead of individual circumstances.

The seventh principle (Validation and Testing)  recommends regular validation of models and testing for harmful outcomes.  This suggests that processing over provenance, checking whether  some expected criteria has been met or not, can be implemented by policy-based approaches over provenance, detecting whether  past executions comply with expectations, described as policies. We have applied this technique to decide whether processing was performed in compliance with usage policies. If this is good practice to undertake validation and testing, therefore, it also becomes a necessity to document such a practice, to be able demonstrate that such validation and testing takes place.

So overall, the provenance research community has been investigating issues around capturing, storing, representing, querying and exploiting provenance information, all of them having a critical role in the principles of Algorithmic Transparency and Accountability.  There is still much to research however, including critical issues around (1) agreed domain-specific extensions of PROV to support transparency and accountability; (2) better integration of the software engineering methodologies with provenance; (3) enforceable compliance with architecture; (4) non repudiation of provenance; (5) querying and auditing facility; (6) compliance checks over provenance; (7) user-friendly explanation of complex algorithmic decisions; (8) scalability of all the above issues.

In the spirit of Principle 1,  I hope this blog post contributes to raising awareness of these issues. Feedback and comments welcome!

Food Supply Chain and Provenance

On 4 June the Secretary of State announced that Professor Chris Elliott, Director of the Global Institute for Food Security at Queen’s University Belfast, was to lead an independent review into the integrity and assurance of food supply networks.

The aim of the review will be to “advise the Secretary of State for the Environment, Food and Rural Affairs and the Secretary of State for Health and also industry on issues which impact upon consumer confidence in the authenticity of food products, including any systemic failures in food supply networks and systems of oversight with implications for food safety and public health; and to make recommendations”.

The public, and all those involved in food supply (and the way that food supply is regulated) are invited to give us their views. Through this call for evidence, we are keen to hear about issues including those which affect consumer confidence.

Thank you for this opportunity to provide input into this review on the food supply chain. As an individual and parent, I am concerned by the quality of food, the quality of ingredients, their origin, and the production processes. For allergies, and in general for health reasons, the correct, timely, and accurate labelling of food products is critically important. Furthermore,  locations where produces are grown, CO2 footprint, and ethical considerations are also matters of increasing interest. Price is also on our mind during weekly shopping, and the last thing we would want is a regulatory burden that would either reduce the variety of foods available to us, or make food unaffordable.

I am writing this contribution in my capacity of Professor of Computer Science, at the University of Southampton, and co-chair of the recent Provenance Working Group at the World Wide Web consortium. As a computer scientist, I believe that the solution to this challenging problem has to rely on technology, and such a technology is now readily available in standardized form. On the Web, we use the term provenance to refer to a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing. Such a concept is equally applicable to food.

The problem of food traceability is analogous to the problem of provenance of data on the Web: multiple organisations are typically involved in the creation of food or Web data, regulatory authorities/governments have to audit the supply chain of food and information on the Web, third parties (such as consumer groups or review sites) can comment on and analyse food and information items, consumers wish to access information about them, rate it, and share it through social media.

The solution to the food supply chain problem includes making provenance of food explicitly available, in a computer processable format, over the Web infrastructure, including all details of its ingredients, production, and delivery, so that it can be contributed to, and inspected, by all stakeholders: farmers, industry, regulators, consumer groups, and end consumers. Ubiquitous availability of provenance will allow novel, useful services to be developed, providing detailed analysis, reviews, or recommendations. All this processable evidence will also help identify suspicious steps in the food production chain, or gaps in the recorded information, which can then trigger further inspection. Ultimately such a provenance-based online environment will promote transparency and accountability and will allow consumer confidence to be restored.

I am expanding on these thoughts below, by answering some of the questions, and would be delighted to discuss them, in person, with the team undertaking the review.

1. What measures need to be taken by the UK food industry and government to increase consumers trust in the integrity of the food supply systems?

Full provenance of food, including the organization, people, food products, locations, processes, and transportation needs to be systematically made available online. Provenance must be in a computer processable, standardized, and open  format so that it can be browsed, mined, analyzed, and audited by all stakeholders of the food industry, hereby offering transparency for the whole industry.   It is also crucial to recognize that no single authoritative source of provenance may exist for a given food product, but independent parties (e.g. regulatory authorities, organizations in the supply chain, consumers, consumer groups) should be able to contribute to such provenance evidence.

With this information infrastructure in place, scanning the barcode of a product on a supermarket shelf, or clicking a button, should give the consumer real-time, up-to-date information about its origin, quality and constituents.

Consumers may not be interested in all the tiny details of a product’s provenance. We anticipate that novel services will emerge that describe products according to user’s preferences, e.g. allergens may be far more important for some than locality of products. Consumers will decide which “food recommender service” they will trust.

The regulatory framework should identify (and to some extent already does) the minimum provenance expected to be found.  Openness, transparency, data journalists, and the many eyes of the crowd will help identify missing or inconsistent provenance.

3. How can government, food businesses and regulators better identify new and emerging forms of food fraud?

Online provenance can become an incredible source of information for detecting inconsistent food labels, suspicious patterns of processing and transporting, lack of inspection, etc. Furthermore, given that every piece of provenance evidence is a claim that should be attributable to some organization or individual,  it allows responsibility to be assigned.  This is particular useful to identify who is responsible for a fraudulent situation.

In this context, provenance evidence becomes the foundation for establishing trust in food products. Provenance evidence itself should be non forgeable. Mechanisms such as digital signatures combined with provenance of provenance allows for provenance evidence to be attributed unambiguously.

5. Do consumers fully understand the way industry describes the composition and quality of the products on sale?

The provenance evidence advocated in this document, can and should be as technical as required for the purpose of auditing, compliance checking, etc.  Since it should be extensive and detailed, it could not possibly be printed on a product packaging, but it should be available online.

The information made available to consumers (including that on packaging) can be seen as a summary of the full provenance of a product, presented to consumers in a friendly and practical manner. Such consumer targeted information should also be regarded as provenance evidence, also available in machine processable format, which can be validated against detailed provenance evidence.

11. How can large corporations relying on complex supply chains improve both information and evidence as to the traceability of food?

By leveraging a standard vocabulary for provenance (and a food-industry specific terminology for all entities, agents and activities), large corporations, their suppliers and consumers, and their auditors, can create a Web of provenance evidence for food, which can be navigated by all stakeholders of the food industry.

12. Should there be legislative requirements for tamper proof labelling, and/or to advise competent authorities of mislabelling if it is discovered in the supply chain?

Cryptographic signature is a mechanism by which a label (or more generally any digital document) can be “signed” by a  party, demonstrating the authenticity of the label (or document).  A valid signature identifies the signer (authentication) in such a way that the signer cannot deny having signed the message (non-repudiation) and that the message has not been tampered with in transit (integrity). Cryptographic signatures are readily available, and with other cryptographic techniques, are extensively used in e-commerce.

It is the expectation that all provenance evidence in this context, including labelling, would be cryptographically signed.

Some reporting facility should be made available, by which inconsistencies in the provenance of food can be flagged. Any such logged report also constitutes evidence, and its veracity should be established by the relevant organisation.

13. What additional information does the public need to be offered about food content and processing techniques? How can this information be conveyed in an easy to understand manner ?   

It is difficult to enumerate today all possible information that could be useful to the public. Beyond the traditional allergens and ingredients, CO2 footprint, ethical considerations, organic certifications may be of interest.

The regulatory framework should not restrict the information made available to public. Instead, it should enable transparency, and facilitate any relevant information to be made available. This will empower organisations to develop services that leverage that information. As indicated in our response to question 5, presentation services can take care of conveying information in an easy to understand manner.  By opening up provenance of food products, an ecosystem of original solutions will emerge, as it has emerged in many aspects of the Web.

Further points

We are not aiming to build another massive, centralised IT system, likely to be bound to failure, given the complexity of the supply chain. Lightweight, agile, Web oriented techniques have proven very successful for this kind of applications.

In this document, we are agnostic as to how data should be made available, whether open or restricted, free or payable for. It is likely that some aspects have to be open (inspection reports, reviews, origins of produces), whereas others may be confidential (which recipe was used for a given product).  We see provenance of food as an enabling platform for a variety of services to be developed, and potentially competitive advantage to be built upon.