When Open Data and Privacy meet: perspectives from the community

August 5, 2014 in Uncategorized

The need for open data practitioners to connect more often with those who work with personal data, as well as privacy advocates (as alluded to in previous blog posts on this platform) was underscored by the discussions from June’s Open Data, Personal Data and Privacy workshop in London. This workshop was jointly convened by Open Knowledge and the Open Rights Group.

 

Agenda-hacking session at the workshop

Agenda-hacking session at the workshop

The workshop set out to bring together folks across the open data, personal data and privacy fields. Our goals were to explore where these areas of work intersect and find constructive actions which could be taken to both de-risk these intersections, and to ensure that each field could benefit from a good understanding of the potentials of the others.

At the start of the meeting, it was apparent that many of the terms used (such as: anonymised data, pseudonymised data, shared data and open data) are understood and interpreted differently by different groups and individuals. This is particularly so with regards to open data (versus shared data) of which much has been written and discussed already. For example, Phil Booth (of medConfidential) recently demonstrated through examples of government shared data how this misrepresentation works (on an ODI Lunchtime lecture). It will be important to develop shared understanding of such terminology, and to raise understanding that different fields have different concerns, uncertainties and context.

This coupled with other discussions around privacy concerns in some open datasets led the group to call for a checklist of sorts to guide data publishing behaviour. The agreement and plan towards the development of this checklist, is one of the key intended goals and outputs of the meeting.

 

What is open data? And how do we ensure that opening up data does not breach privacy?

For those in the open data world, essential non-personal infrastructure information has the most value when it is openly shared (as per the Open Definition) and used and useful. There are many datasets where open sharing leads to all kinds of value, from citizen empowerment to enhanced research to economic growth – all without any personal data!  This is a world away from the popular discourse of “data as the new oil”, where high corporate and government valuations are made of primarily personal data.

The workshop reached a general consensus that personal data should never be open data, although with a few exceptions. For example, individuals are increasingly thinking about the possibilities of opening up their own personal data (for example for medical research), and with appropriate consent and awareness of risks, this could be acceptable.   However, there should not be personal data in datasets which governments may mandate should be opened up, as a rule.  Examples of difficult cases include personal information about elected representatives, where more information than just name is required to explore company control for anti-corruption purposes, or where individual expenses/spend data may indirectly include personal information.  Therefore there was a need to think through the challenges that opening up some datasets, in some contexts (such as including country, data collection method, individual consent, public interest, and more) pose for privacy considerations. It was also clear from the discussions that open data and privacy need not be in tension with each other. For example, the group considered also instances when openness (by promoting transparency) of procedures helped to safeguard the privacy of data subjects and users – companies and customers (see a blog post from Reuben Binns with some concrete examples).

 

Key challenges identified

A major challenge is in the arena of the extent to which de-identification (or anonymisation) can be done successfully in order to eliminate risks of re-identification. This remains the on-going debate especially since auditing of anonymisation efforts is not easy. It is important to note that one of the key request conveyed at the meeting was the need for evidence (through established research works) to back the various assertions being made. There were also significant questions around anonymisation as against pseudonymization and how legislations (UK and EU) can help clarify its application in specific contexts. There were calls for a body of knowledge to be developed on these, especially as pseudonymisation- a highly contested technique (though useful in certain instances) can be seen as either posing significant risks to privacy, reducing the usefulness of the datasets, or stripping away the rights of data subjects.

These are all  thorny issues with most people who will like to see more safeguards in place for privacy protection in datasets that contain personal information, as well as personal data enthusiasts advocating for greater access and control to be given to data subjects.

The issues surrounding consent were also discussed. While consent is one of the possible paths to (personal) data publishing and processing, it becomes problematic with open data as for strong consent, one needs to know what specifically is being agreed to. The ability to freely reuse information for any purpose (through open licensing) may leave this very open and hard for informed consent to take place.

Further, there were discussions around  privacy  concerns in project design, and it was found that privacy consideration is not an easy thing to build into projects (where data of some form would invariably be collected). Regardless of later questions of data sharing, or data publication, the overall project design around collection and data handling is key for privacy protections.  This is mainly due to the fact that funders are often separate from implementers, while the technology environment  is also removed from research or policy. Additionally, the universality of the certain standards – to cut across sectors and countries – is in  question.  However, the group made attempts to advance possible solutions by proposing the regulation of data collection; the application of dystopian perspectives to project design;  streamlining incentives systems to favour privacy-centered projects;  as well as the reinforcement of  data minimisation principles.

 

What are the solutions in place, and the next steps after the workshop?

Finally, it was decided that there are indeed some useful resources, tools and guidelines already in place in relation to the various topics on personal data and privacy. However, what is lacking is accessibility to them, and a better way of communicating. Towards this end, it was decided to have the following as intended outputs of the workshop:

1) A ‘talking points’ guide for communicating on issues in open data and privacy (including documenting both harm and success stories)

2) An introduction to anonymisation (techniques)

3) A resources toolkit (with a glossary of terms and a list of research papers to support evidence-building)

4) A checklist for open data publishing and reuse (that gives a consideration to personal data and the issues surrounding privacy concerns)

5) A study report on personal data licensing and the potential for individuals to share their own information.

 

Hence, over the course of the next few months, the project will be working (with the relevant experts) to develop the above-named products, which will be disseminated and communicated widely.

 

On the whole, the workshop was very positively received. One participant noted in the evaluation form

“ I thought it would be just a talking workshop/didn’t really know what to expect from the pre-event information, but the event was more informative and helpful than traditional powerpoint presentations would have been”.

There are other key takeaways from the workshop, which have been captured in the workshop notes on the Wiki  and Webpage.

 

Photo credit: Dirk Slater

 

 

 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *