Why privacy considerations matter in the current open data environment

April 29, 2014 in Featured

In a little under 2 months, experts in open data, privacy, and personal data management will gather in London to spend two days deliberating on issues surrounding the privacy concerns of opening up data systems that might contain elements of a personal nature.This meeting could not have come at a better time. This group of people hitherto have not had many opportunities to interact, especially since open data communities have been preoccupied with non-personal data (specifically data that is of a public nature). However, one finds that with time, the data being collected and opened increasingly run the risk of containing identifiers which make identification of individuals possible. Reasons for this include the need for these data systems to respond to transparency and accountability imperatives- emphasizing those unavoidable tensions between openness and privacy. But most often, it is simply because anonymization fails. These issues are highlighted in previous posts to this forum.

Consequently, anonymization itself has become a contentious issue in the arena currently, with many privacy experts raising the question of it is infact any effective in protecting consumer privacy. A recent ODI Friday lunchtime lecture by Ross Anderson highlights several instances when anonymity has failed in health data. In the world of geodata also, the likelihood that privacy concerns could be violated through open data systems was flagged in a recent blog post. The author demonstrates how by applying a variable degree of data mining effort, one is able to de-anonymise bicycle journeys data from a publicly-available dataset from Transport for London.

Blogpic1

Other relevant issues are raised. For example, there is the issue of uncontrolled data mining which increases the risk of re-identifying anonymised data often by linking several datasets. There are also the risks of some cross border transfers of personal data that violate Principle 8 of The Data Protection Act which states that “Personal data shall not be transferred to a country or territory outside the EEA unless that country or territory ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data”. There is a need for more clarity on the problems surrounding the global flow of these problematic datasets.

Additionally, the issue of what are the boundaries to defining the key data terminologies: ‘personal’, ‘anonymised’, ‘transformed’, ‘aggregate’, ‘mydata’, and ‘pseudonymised’ keeps coming up in the debate. At a recently-held workshop by the My Data Working Group of Open Knowledge Finland, the need for a broader consideration of the concept of ‘my data’ was apparent from the discussions. Additionally, from the discussions led by keynotes from Nils Torvalds and Mydex’s William Heath, participants agreed that the lessons learned from the advancements in the UK’s development of Mydata could lend perspectives to developing a similar strategy for Finland. The need for these principles to be applied from a global context point of view is therefore reinforced by the international outlook of this London meeting.

However, it appears from recent debates from the Helsinki meeting, as well an on the My Data WG mailing list forum, that some practitioners are questioning not only the definition of the term ‘my data’ but also the necessity of using it to distinguish from certain aspects of personal data. The London expert meeting offers further opportunities to debate this further, as one of the goals of the working group is to have a working document (much like that of the Open Definition) which clearly defines these terms to streamline how we communicate on them.

Crucial to debates on mydata systems, is the issue of what sorts of controls data subjects can have and want to have over the data held on them. Managing own data can be time-consuming, technical and is often not straightforward. The current options of opt-out (for example as used in the controversial care.data scheme
Blogpic2 need to be weighed against schemes that offer opt-ins. Often, options of giving consent are broken (due to one reason or the other) and there is the need to investigate what alternative forms of control are available to data subjects.

As a response to these myriad of issues, the participants at the meeting are tasked with proposing interventions that tackle tools, policy and data literacy gaps through capacity-building, tools development and communications activities. They will therefore carefully review the efficacy and applicability of some of the already-existing tools: for example, use of consent-receipts, datenbriefs, and the proposals for co-regulation (by data subjects and data publishers) in managing privacy concerns. Additionally, they will lay out principles to govern behaviour of data publishers. Among other things, the principles will propose standards that need to be kept when anonymising and aggregating data to minimise risks of re-identification. The principles will also include a checklist for data publishers to guide their decision to open up a particular dataset or not. Overall, the goal is to come out this meeting with an outline of specific interventions taking into consideration the different interests and capacities of those already undertaking activities in this environment..

In the weeks leading up to this meeting, the WG will continue to engineer critical discussions on the dedicated mailing list, wiki page and on the Twitter forum, so do visit these pages to contribute your thoughts.

Leave a reply

Your email address will not be published. Required fields are marked *