You are browsing the archive for 2014 August.

Why privacy is less nebulous than it’s sometimes made out to be

- August 13, 2014 in Uncategorized

This guest post is by Walter van Holst.

The complexities of the notion of privacy

Two recurring themes in conversations about privacy and personal data are that privacy is such an abstract concept, and that public data can’t be personal data. The former is a myth, the latter a misunderstanding, sometimes an understandable one. A recommended reading on the false dichotomy between public and private or personal is danah boyd over at She’s a recommended reading anyway, although I disagree with the complexity of privacy as a whole. It ultimately boils down to the notion of agency: how many degrees of freedom do I have left. And not in the hard, non-coercive, sense of the word freedom. Do I feel like I can freely research Jihadist literature on the internet? To look up a medical condition via a search engine? To communicate with someone who is a well-known investigative journalist? Information empowers, which is both a good and a bad thing. Good because it can mitigate existing power differentials and prevent new ones from happening. Bad because it can amplify existing ones or even create new ones.

Where is open or personal data in this mix?

Open data has always been as much about the mitigation and prevention of power differentials as about innovation. In a sense privacy is about the same core values. That this core value is expressed and enshrined in law differently over time and in different cultural contexts is what makes it complex in practice. In the USA, the starting point is the right to be left alone, born from the injustices of British colonial rule. In Europe the core concept is more that of informational privacy, born from the injustices brought about by Nazi and Stalinist rule. Quite unsurprisingly given the way law develops over time, a lot of privacy law has a philosophical underpinning that is dodgy at best. Property, a core concept in any society more complicated than hunter-gatherers, lacked a sound underpinning till the advent of game theory and its application to economics. From that perspective, privacy already is a remarkably mature concept. And speaking about property: for the love of all that is right, let’s stop framing personal data in terms of ownership!

Image credit: AFP

Image credit: AFP

Your personal data is very much like your shadow in that it both reflects you as a person but can also give a distorted reflection of yourself. Your shadow can take on a life on its own, like in the Indonesian Wayang puppet theatre, including all the drama that ensues in that art form. Personal data is data about a person, not owned by that person. Privacy is more than personal data, but in the context of an information society in which everything becomes data, personal data will become more synonymous with privacy than it already is. And we will become very boring people if we are not wary about this and regain the territory that has been lost already!

Image credit: WSJ/Tim Robinson

Image credit: WSJ/Tim Robinson


When Open Data and Privacy meet: perspectives from the community

- August 5, 2014 in Uncategorized

The need for open data practitioners to connect more often with those who work with personal data, as well as privacy advocates (as alluded to in previous blog posts on this platform) was underscored by the discussions from June’s Open Data, Personal Data and Privacy workshop in London. This workshop was jointly convened by Open Knowledge and the Open Rights Group.


Agenda-hacking session at the workshop

Agenda-hacking session at the workshop

The workshop set out to bring together folks across the open data, personal data and privacy fields. Our goals were to explore where these areas of work intersect and find constructive actions which could be taken to both de-risk these intersections, and to ensure that each field could benefit from a good understanding of the potentials of the others.

At the start of the meeting, it was apparent that many of the terms used (such as: anonymised data, pseudonymised data, shared data and open data) are understood and interpreted differently by different groups and individuals. This is particularly so with regards to open data (versus shared data) of which much has been written and discussed already. For example, Phil Booth (of medConfidential) recently demonstrated through examples of government shared data how this misrepresentation works (on an ODI Lunchtime lecture). It will be important to develop shared understanding of such terminology, and to raise understanding that different fields have different concerns, uncertainties and context.

This coupled with other discussions around privacy concerns in some open datasets led the group to call for a checklist of sorts to guide data publishing behaviour. The agreement and plan towards the development of this checklist, is one of the key intended goals and outputs of the meeting.


What is open data? And how do we ensure that opening up data does not breach privacy?

For those in the open data world, essential non-personal infrastructure information has the most value when it is openly shared (as per the Open Definition) and used and useful. There are many datasets where open sharing leads to all kinds of value, from citizen empowerment to enhanced research to economic growth – all without any personal data!  This is a world away from the popular discourse of “data as the new oil”, where high corporate and government valuations are made of primarily personal data.

The workshop reached a general consensus that personal data should never be open data, although with a few exceptions. For example, individuals are increasingly thinking about the possibilities of opening up their own personal data (for example for medical research), and with appropriate consent and awareness of risks, this could be acceptable.   However, there should not be personal data in datasets which governments may mandate should be opened up, as a rule.  Examples of difficult cases include personal information about elected representatives, where more information than just name is required to explore company control for anti-corruption purposes, or where individual expenses/spend data may indirectly include personal information.  Therefore there was a need to think through the challenges that opening up some datasets, in some contexts (such as including country, data collection method, individual consent, public interest, and more) pose for privacy considerations. It was also clear from the discussions that open data and privacy need not be in tension with each other. For example, the group considered also instances when openness (by promoting transparency) of procedures helped to safeguard the privacy of data subjects and users – companies and customers (see a blog post from Reuben Binns with some concrete examples).


Key challenges identified

A major challenge is in the arena of the extent to which de-identification (or anonymisation) can be done successfully in order to eliminate risks of re-identification. This remains the on-going debate especially since auditing of anonymisation efforts is not easy. It is important to note that one of the key request conveyed at the meeting was the need for evidence (through established research works) to back the various assertions being made. There were also significant questions around anonymisation as against pseudonymization and how legislations (UK and EU) can help clarify its application in specific contexts. There were calls for a body of knowledge to be developed on these, especially as pseudonymisation- a highly contested technique (though useful in certain instances) can be seen as either posing significant risks to privacy, reducing the usefulness of the datasets, or stripping away the rights of data subjects.

These are all  thorny issues with most people who will like to see more safeguards in place for privacy protection in datasets that contain personal information, as well as personal data enthusiasts advocating for greater access and control to be given to data subjects.

The issues surrounding consent were also discussed. While consent is one of the possible paths to (personal) data publishing and processing, it becomes problematic with open data as for strong consent, one needs to know what specifically is being agreed to. The ability to freely reuse information for any purpose (through open licensing) may leave this very open and hard for informed consent to take place.

Further, there were discussions around  privacy  concerns in project design, and it was found that privacy consideration is not an easy thing to build into projects (where data of some form would invariably be collected). Regardless of later questions of data sharing, or data publication, the overall project design around collection and data handling is key for privacy protections.  This is mainly due to the fact that funders are often separate from implementers, while the technology environment  is also removed from research or policy. Additionally, the universality of the certain standards – to cut across sectors and countries – is in  question.  However, the group made attempts to advance possible solutions by proposing the regulation of data collection; the application of dystopian perspectives to project design;  streamlining incentives systems to favour privacy-centered projects;  as well as the reinforcement of  data minimisation principles.


What are the solutions in place, and the next steps after the workshop?

Finally, it was decided that there are indeed some useful resources, tools and guidelines already in place in relation to the various topics on personal data and privacy. However, what is lacking is accessibility to them, and a better way of communicating. Towards this end, it was decided to have the following as intended outputs of the workshop:

1) A ‘talking points’ guide for communicating on issues in open data and privacy (including documenting both harm and success stories)

2) An introduction to anonymisation (techniques)

3) A resources toolkit (with a glossary of terms and a list of research papers to support evidence-building)

4) A checklist for open data publishing and reuse (that gives a consideration to personal data and the issues surrounding privacy concerns)

5) A study report on personal data licensing and the potential for individuals to share their own information.


Hence, over the course of the next few months, the project will be working (with the relevant experts) to develop the above-named products, which will be disseminated and communicated widely.


On the whole, the workshop was very positively received. One participant noted in the evaluation form

“ I thought it would be just a talking workshop/didn’t really know what to expect from the pre-event information, but the event was more informative and helpful than traditional powerpoint presentations would have been”.

There are other key takeaways from the workshop, which have been captured in the workshop notes on the Wiki  and Webpage.


Photo credit: Dirk Slater