Looking back- our year in open data and privacy work

- February 25, 2015 in Uncategorized

Today picAt the end of this month, the Open Data & Privacy project (funded by OSF) formally closes. Over the past year (during which we implemented the project), Javier Ruiz and I (as the core team) have attempted to gingerly navigate this complex environment of trying to understand if and how  open (data) and privacy can find a balance. I would like to believe that we have done this with some amount of success. Of course there has always been the support of a really knowledgeable expert community to tap into.  And I would like to duly acknowledge the input of in particular, Malavika Jayaram, Mark Lizar, Reuben Binns, Antti Poikola and Walter van Holst. Their amazing individual contributions are dotted in various posts on this website.

It has been a great period of learning and also growing in understanding of the issues we are dealing with from the varied contexts and sectors. I have attempted to document all the learnings from the project over the past year in periodic blog posts on this platform. This post however attempts to summarise (and highlight) what are the big (but less specific) aha’s for me based on my own critical reflection on the design and execution of the project’s key activities.

 

  • Community is extremely important

Community is always valuable for any open data initiative and it’s not any different for the Open Data & Privacy project. However, I can say that because of the peculiarities of this project- the complexity of the issues, and the varied contextual nuances, community is extremely important! Well for one, as the project name suggests, having expertise in both open data and privacy is essential, but finding individuals who possess a deep understanding of the issues in both fields with significant depth is not easy. In essence, delivering on the projects core promises (which includes key knowledge outputs) requires having individuals on board that can engage with any number of these areas.  An interdisciplinary Working Group composed of various expertise is therefore vital. On the flip side however, this (having such a mix of people) also presents a challenge to the group in that there are quite strong individual ideas coming across often that find very little common ground with the others. This certainly has its disadvantages as it can stall activities that need some form of consensus to proceed. I discuss this in the next point.

 

  • You don’t always have consensus, but that’s okay

Being a project that is intended to be community-driven, having buy-in is essential.  However, buy-in often requires some form of agreement on the issues, and for this community, that is not so easily achieved. For instance, at the very basic level, an agreement on what the key terminologies being used in the space mean is desirable. However, as experienced with the never-ending exchanges on many project platforms (principally the mailing list) there are no common interpretations of most of the terms that would be acceptable to each and every member of the group. Additionally, some of the more problematic definitions are still evolving somewhat. Even more complex are the issues (and the proposed solutions where applicable) and how they are manifested in different contexts, and for which there was little agreement about what is in fact the case. The effectiveness (or otherwise) of anonymisation is a case in point. It became fairly obvious (to us) much later on in the course of the project that it is entirely possible to work around these disagreements. For example, with regards to the terminologies, by having a living document (on wiki) which can be edited by the community, this issue of evolving definitions/interpretations is addressed.

 

  • Finding a ‘one-size fits all’ solution is often not the answer

This point relates closely to the previous two.  In our attempt to get a set of community-driven principles developed to guide data publishers, we found out quite quickly that the requirements  of the various sectors (disciplines) are quite different and a ‘one size fits all’ solution is not only undesirable but also not feasible. To illustrate, the requirements of  government data publishers are are quite different (as some disclosure laws can apply)  from their counterparts in the private sector. Additionally, certain types of datasets, for example health data and location data are generally more sensitive and need to be uniquely treated. Data protection laws (and their interpretations and applications) also differ in various localities across the world.  Once again, finding a way around this was necessary. The solution was to design the principles to address a specific sector and context.  For example, the engine room has done some great work leading the development of a handbook (on responsible data handling) for the development sector. Open Knowledge’s Open Data Handbook will also contain a set of privacy principles more geared towards public data.

 

  • Changing direction comes with the territory

As a concluding point, I can note that when working in an environment as complex and as unpredictable as the one presented by open data and privacy  considerations, it is inevitable that one gets pulled in directions different from what one intended on the outset. This is particularly so in relation to focus areas, and the specific approaches adopted towards achieving the project’s objectives. For example, in the case of the former, it was easy to see that though the project was intended to address privacy issues surrounding  open data specifically, there was a need to also address some other ones because the issues (such as data sharing, and internet security) are indeed interrelated.  Being agile and broadening our scope to support work in these areas was therefore necessary.

 

I would like to highlight that this project was intended to explore the environment of open data and privacy- to understand the issues and the actors, what is required to address the identified issues, and what is feasible given the resources (human, financial) that are available. I do believe therefore that the lessons learnt during the course of implementation thus far (some of which I have shared above) will greatly shape our design of future interventions in a more positive and efficient way.

 

Data and Privacy: as Discussed in the Internet Monitor 2014 Report

- January 15, 2015 in Uncategorized

During the latter part of last year, I started collating all the relevant literature available on issues pertaining to personal data and privacy. This isBlog_Jan15 contained in a catalogue of resources maintained on the Personal Data & Privacy Working Group wiki. The intention behind this exercise is to facilitate easy access to materials that are essential to fully understanding and working around this complex and evolving area.  In this post, I would like to highlight one recent publication in particular, i.e. the data and privacy chapter which is included in the very enlightening Internet Monitor 2014: Reflections on the Digital World report. This comprehensive publication (by the Berkman Center for Internet & Society at Harvard University) examined the digital trends of the year. However a very useful and relevant chapter was dedicated to exploring data and privacy issues. I feel this chapter nicely summarises some of the discussions we have attempted to have over the last year within the WG.

Overall, it highlights how difficult it is to ensure privacy on the web, especially since operationalising standards such as informed consent often get quite murky. Internet security is closely linked to this, where the powers that be (in the form of governments, superpower tech companies) can and do limit the ability of individuals to safeguard their own data protection interests. Many legal and technological solutions are proposed but most are not so easy implemented. Articles by noted open data and privacy experts Tim Davies and Malavika Jayaram offer some insightful discussions. The former discusses the subtle and not-so-subtle tensions between accountability and privacy protection in open government initiatives, especially when issues about what can in fact be considered as effective anonymisation come to the fore (as the hot debates on most platforms in the previous year show).  Sharing insights along similar lines, Jayaram argues for the development sector specifically, noting that advances in the use of technology for data collection and management (for everything, from elections to healthcare) present significant opportunities for privacy violations due to the vast amounts of personal (and often sensitive) data involved. Put into this mix the varied (and often inadequate) capacities of the actors from the different (developing) contexts and this is exacerbated even further.

However, mirroring the tone of Davies’ closing lines, Jayaram calls out on a more positive note,  the increasing interaction of the ‘privacy’ and ‘open’ camps (and the Personal Data & Privacy WG is a typical example of such a platform),  which is currently touted as the way to go to ensure that more nuanced and necessary conversations are taking place to stem this negative tide of personal data abuse.

Outside of these two articles, Neal Cohen’s piece also touches on a particular important set of actors, that is (data) regulators, and the role that privacy laws play in general. He notes that companies need to know unequivocally what exactly they are permitted to do with the data (they collect and manage). But with non-clear data protection laws in place coupled with cross-jurisdictional data flows increasingly becoming a significant problem, we may however not be any nearer to having this standard in place.

The full report can be accessed here.

Challenging the ‘Open by Default’ Principle at Open Up? 2014

- November 20, 2014 in Uncategorized

The Personal Data & Privacy Working Group was invited (along with 17 others) to present our work in a demonstration booth at the well-attended Open Up? 2014 event at the Dutch Center last week (November 12) in London. It was a great opportunity for my colleagues and I to interact with folks and  introduce the WG to new contacts, specifically highlighting some of the activities we have undertaken over the course of the past few months, and give them some takeaways in the form of the Open Data and Privacy primer, etc.

However, it was also obvious that for most of the 170+ attendees, the main event with its impressive and diverse line-up of speakers was a huge attraction.  Open Up?  2014, the second in a series organised by Omidyar Network (since it partnered with DFID to host it two years ago) saw a well-thought-out agenda where all the relevant issues were discussed- from the supposed tensions between openness and privacy, to the very important and interrelated issues of  transparency and trust, both within government and the private sector towards data collection and handling. In the build-up to the big event, there were a couple of interesting and thought-provoking blog posts written by some of these notable speakers. In particular, I highlight here the article (titled Opening Policy, Protecting Privacy) written by Tim Hughes where he raises some vital questions on how  governments use and should use the personal data at their disposal, while Sunil Abraham attempts to resolve the dichotomy between privacy and openness in this post.

Demonstration booths at open up

Demonstration booths at Open Up? 2014. Photo credit: Omidyar Network

The line up did live up to its billing and delivered in terms of the diversity of opinions and country-specific experiences that were shared. While much of the discussion at Open Up? 2014 centered around this particular community of interest groups and persons, I thought it was interesting how playwright  James Graham’s opening session highlighted how ordinary folks (the UK cinema audience to be exact) are taking an interest in privacy and surveillance and what these mean to them  in the age of ubiquitous technology and access to information. There is a valid concern here that often  we do not know what we are in fact ‘sharing’ via social media, services apps etc. and it’s good to see folks outside this (quite small) community engaging with this issue.

 

Back to basics on what should be open

So what is the current stand on open (data) by default? It is interesting to see that it’s back to basics about what data (set) should be open and what should not be open. It is fair to say there was no consensus reached on this.  However, we did make great inroads in questioning the implications of ‘open by default’ for the respect for human rights and privacy, especially in non-democratic  contexts where loss of privacy has particularly grave consequences.

While most people are still comfortable with the idea of ‘open’, what they are also asking for is a level of mandatory transparency by governments and corporate bodies about what is being collected and why. In a poll conducted during the event, poll 86% of the audience were of the view that their government is not making all data collection activities known. One speaker also called for an inventory of the scale of mechanisms on surveillance especially. It seems therefore that governments and corporate bodies making more information available to individuals is vitally important, but it’s only a necessary first step. It also cannot be overemphasized that the open data community needs to be more specific and emphatic about what is open data and what isn’t, especially since other government data-sharing activities can sometimes be misconstrued as such.

There are other relevant issues that are not so easily resolved either.  There is the concern that big data is increasingly under the control of only the state or a powerful few. There are quite scary future implications of this, and one of the speakers raised the point that  a culture of perceived low digital literacy among an important stakeholder group (i.e. policymakers) makes it difficult for them to  grasp the enormity of what this entails for any functioning democracy.

James Graham at OpenUp14

Playwright James Graham making the first presentation for the day. Photo credit: Omidyar Network

last panel for the day

Sunil Abraham, Timothy Garton Ash, Richard Allan and Pria Chetty discussing data collection and the private sector. Photo credit: Omidyar Network

There were a couple of surprisingly frank opinions on the floor. One speaker noted that some amount of surveillance is necessary, though in ‘tiny bits’. It is unclear what qualifies as tiny bits. Some others were of the of the view that there should be some limits to the the amount of (government) information that should be openly available, especially in the light of valid concerns about national security. Once again, where exactly we should draw the line between mandatory disclosure/transparency and  security/privacy is not always apparent.

As Stephen King (a partner at Omidyar Network) said in the closing remarks, the event intended to bring stakeholders from the different sectors together, and I think that is an extremely vital thing to do as it enables us to look at these thorny issues from different perspectives.

 

The photos and the videos from Open Up? 2014 have been made available, and more information can also be found via the event twitter feed (#OpenUp14).

Responsible data in development

- November 17, 2014 in Uncategorized

15481425725_0037054f89_z

In mid-October, on behalf of the Personal Data & Privacy WG, I joined a group of development data experts from around the world, to co-author a book on the topic of responsible data in international development. We launched the first version of the book just 4 days after we met, and it is now available for download here.

We used the ‘booksprint’ method to produce the book, which involved bringing together people from a variety of sectors, to write the book from start to finish, in three days. We had no preparatory work laid out beforehand, and we were lucky to have the process was facilitated by Barbara Ruehling, who has done this many times before, on a range of topics.

Having people from different backgrounds to work with on the book brought a great richness to the text itself; too often, the conversation happening within the digital security crowd is a world away from the discussions within the international development movement. But this time, we had digital security experts, collaborating intensely with development practitioners, data nerds, and privacy advocates, to produce the book.

We produced the book as a first attempt to understand what ‘responsible data’ might mean within the context of international development programming. We decided to take a broad view of ‘development’, and, hopefully, some of the information within the book will also be useful for those working in related fields, such as human rights defenders, or activists.

We decided to focus, however, on ‘development’ due to the growing hype around the ‘data revolution’, with the UN Secretary General’s Data Revolution Group releasing just last week their report, A World that Counts. We see potential for harm that accompanies data and technology within the development context, which is too often ignored, and we wanted to focus on this.

The authors of this book believe that responsibility and ethics are integral to the handling of development data, and that as we continue to use data in new, powerful and innovative ways, we have a moral obligation to do so responsibly and without causing or facilitating harm. At the same time, we are keenly aware that actually implementing responsible data practices involves navigating a very complex, and fast-evolving, minefield – one that most practitioners, fieldworkers, project designers and technologists have little expertise on. Yet.

The team behind the book was:
Kristin Antin (engine room), Rory Byrne (Security First), Tin Geber (the engine room), Sacha van Geffen (Greenhost), Julia Hoffmann (Hivos), Malavika Jayaram (Berkman Center for Internet & Society, Harvard), Maliha Khan (Oxfam US), Tania Lee (International Rescue Committee), Zara Rahman (Open Knowledge), Crystal Simeoni (Hivos), Friedhelm Weinberg (Huridocs), Christopher Wilson (the engine room), facilitated by Barbara Rühling of Book Sprints.

The book is available for download now, under a CC-BY-SA license; please feel free to remix and reuse it.

It is also catalogued as a relevant resource on the Personal Data & Privacy WG wiki.

Reflections on #Mozfest 2014

- November 3, 2014 in Uncategorized

From 24th to 26th October, I participated in my first ever Mozilla festival. With 11 tracks ranging from  Build and Teach the Web, to Community Building, and numerous parallel sessions and activities, it was quite difficult deciding where to be at any one time. Like most of the other tech events I had attended this year, the message that the web offers tremendous potential resonated deeply in all the presentations and keynotes.

However, I was also very much interested in other discussions which interrogate the ‘other sides’ of this hype. Hence, I decided to participate in the sessions on how Mozilla is championing the privacy policy and advocacy movement, as well as those run by Web We Want and the Electronic Frontier Foundation (EFF). As I thought, the Souce Code for Journalism sessions were also quite enlightening and I’m glad I stopped by those as well. Some of the topics that stood out for me are:

WebWeWantwall

The Web We Want wall at the event. Photo credit: Mozilla in Europe

Digital rights

The work being done in advancing this area is outstanding. Mention can be made of the Web We Want and EFF’s latest campaigns as well as the iRights platform being championed by Baroness Beeban Kidron which advocates for young people to have access to security and privacy on the web.

Though I have participated in a number of discussions where the issue of surveillance and privacy on the web  was on the agenda, this was my first time where I had the opportunity to listen to first-hand accounts from  people who have had a personal experience of one form or the other, either by being affected themselves , or working with or supporting those who do. Though it was obvious that quite a few of us were out of our depths with regard to the practicalities of how to support ‘victim’s’ of internet surveillance, it is good to consider that there is a whole spectrum of needs and corresponding support to give (for example financial, technical and moral support), so it was concluded that each of us can and should play a part.

 

Baroness Kidron

Baroness Kidron delivering her keynote. Photo credit: Mozilla in Europe

Also, my sense is that there are quite a number of us who agree with Baroness Kidron that the web has not delivered on its promise of  fairness and equality for all. And while we recognise that the (promising) future of the web which advances these ideals needs to be actively campaigned for, we might not identify strictly as activists and this can affect the stake we perceive as having in this space.

Closely related to this is the idea that  promoting web literacy for all is crucial to securing privacy and security on the web. This requires a more than basic understanding of how surveillance works, the philosophy behind it and the tools (mostly technological but also very much policy-related) that are available to help circumvent them. The session jointly organised by Privacy International (PI), EFF and Access, as well as the one on  Do Not Track by Justin Brookman were therefore particularly useful in expanding our knowledge of what surveillance is, in my opinion.

Data: open and inclusive

The idea that data is not end in itself but is a tool that must be harnessed towards effective change was yet again advanced at Mozfest 2014. In particular, in one of the sessions in the Source Code for Journalism track led by Laurenellen McCann, we explored first and foremost what open data means and looks like in different contexts. But more importantly, we discussed what it means to have a community involvement in open data, and if it’s at all possible or practical to have community-driven strategies. And finally, we discussed ideas around digital inclusion and what it would take to make open data more inclusive than it currently is.

Other discussions in the space identified that sometimes and in certain contexts, open data is just a box to be ticked. However,  open data systems must be truly open (according to the open definition), and must also be consistently updated with the necessary data sets. That said, consideration must be given to users who come from different contexts with different needs and capacities- so inclusiveness must be balanced with openness.

moz1

Both opening and closing fairs featured many innovative ideas and products. Photo credit: Mozilla in Europe

Things to watch out for

We all have our individual takeaways from Mozfest 2014, but for me some of the things that sound exciting and which I would be looking out for include the Privacy 101 platform which is to be launched by PI by close of the year. Privacy 101 will make available all the wealth of information and resources generated so far by PI (and its partners) from various activities across the world.

I will also be watching out for Mozilla’s work exploring specific potential tech solutions that promote privacy protection. For example, these (solutions) will be built into the browser, and which more importantly does not optimize usability over privacy and vice versa. I will also be looking to support the Mozilla Advocacy platform to engineer potential community-driven policy solutions to internet tracking.

And finally, one of the items we worked on was a transparency code for data journalism. The intended output is a manifesto (of sorts) with a set of principles to guide news organisations on how to be transparent about their data projects. I made the case that at the minimum, the principles should have considerations for data privacy and inclusiveness, and I look forward to seeing how this is done.

Is privacy (and internet security) on the open development agenda?

- October 14, 2014 in Uncategorized

Banner_main

De Balie, Amsterdam. Photo credit: www.planetxam.nl

Last week saw the open community move to Amsterdam, for DownScale2014 at VU University Amsterdam (on Wednesday October 8), and Open Development Camp (ODC) 2014 on October 9 and 10 at De Balie.

One of the main reasons why I participated in both events was to carry forward with some discussions around privacy in open and big data, and also internet security which were started at OKFestival 2014. Because these two events were focussed on open development (open data in development institutions or in developing countries). I was particularly keen to find out  the extent to which this community is taking privacy considerations seriously.

I engaged with enough participants and sessions to know that a simple yes or no will not suffice in answering this question. There are so many different factors to consider, and I reflect on some key ones below.

In development, the benefits versus risks debate is amplified

In the development arena, the debate about whether to explore the potential benefits of open (and also big)  data or to consider the possible risks is particularly significant. This is due to the fact that the desire to drive meaningful change and impact lives  is greater in this area. However, the risks are also significant as many of the actions target particularly vulnerable communities who are in countries that do not have the infrastructure to sufficiently manage these risks. A few of the case studies revealed that while there is a recognition of the possible privacy and security concerns that might be inherent to particular projects, they are not being seriously considered. Not surprisingly, some participants took such presenters to task for being dismissive about the issue, as many of the projects are designed to typically collect so much data without the due privacy considerations.

That said, there are some areas which are seeing these concerns being tackled actively. Mention can be made of the hot off the press Ways to Practice Responsible Development Data handbook which was put together by a group of individuals and organisations (including Open Knowledge). This resource was hailed as necessary to guide development practitioners in their projects. The initiators of the development of this resource (the engine room and Hivos) recognize that several things could go wrong on projects that pose potential privacy risks to individuals. In order to prevent this therefore, they put forward this book to ensure that development actors commit to following some ‘do no harm’ standards.

Having initiated several projects already in the developing world, the World Wide Semantic Web is also now ensuring that privacy safeguard measures are actively built into the project design through a multidisciplinary team that is working on their ‘the Box’ platform. Hence, they are exploring how to use ‘the Box’ to for instance, encrypt and reroute private stores of information to protect the privacy in grassroot information-sharing (on for example, land contracting). If this is done, it will go a long way in ensuring that individuals who share sensitive information are protected.

Further, in some countries (for example Bahrain) where there are significant risks posed to well-known journalists when reporting on sensitive issues, activists are advocating for the use of offline reporting tools such as the StoryMaker. This app allows for anonymous yet interactive storytelling by even non-journalists. WildLeaks also promotes the use of the Tor tool which protects anonymity in wildlife whistle – blowing. In fact, they take confidentiality so seriously at WildLeaks that they are yet to use Facebook and Twitter, although these platforms can greatly enhance the impact of their work.

All these examples are noteworthy, however we need to see more projects and organisations giving this due consideration to privacy protection. As we know, even one risk to an individual is one too many.

ClosKeynote

Malavika Jayaram delivering the closing keynote. Photo credit: www.planetxam.nl

security session

The interactive fishbowl session on internet security. Photo credit: www.planetxam.nl

Internet security solutions not implementable across board

In a remarkable closing keynote, Malavika Jayaram illustrates the complexities of privacy and internet security in an increasingly technological and open world. In fact, it was also interesting to see the official photographer for ODC14 (in the opening session of day 2), express his reflections on the significant privacy concerns in a world  where the digital space is inundated with so many photos and videos. This issue was high on the agenda, i.e. the extent to which participants feel they are able to deal with the challenges posed by internet security, which is usually seen as outside of their realm of control.

We had a very revealing and interesting interactive fishbowl session on this issue where we attempted to hash out the possible solutions. Most of the participants were of the opinion that there was the need to break the hegemony of big Internet Service Providers (ISPs) by having alternatives emerge to challenge them. However, it was noted that this is easier said than done as most people will rather sacrifice privacy and internet security for convenience. Relative to this, another solution that was advanced required that individuals take responsibility for their own internet security by  ensuring proper use of encryption, etc. However, once again implementation is not so easy given the identified challenges of using even the more common encryption tools such as PGP. Open for Change was therefore tasked with organising a  practical session on encryption at the next ODC. Another possibility that was advanced is the idea that in the future a new kind of digital privacy literacy must emerge, where there’s no distinction between providers and clients. This will require each person to own their own servers and  all their data, and manage it themselves. These ‘solutions’ are clearly far-fetched , especially for developing countries where in reality, access to technological infrastructure and capacity is nowhere near the level of  western countries.

Looking forward: where will the discussions around privacy take us in the coming year?

So where will these discussions take us in the coming year leading up to Downscale 2015 and ODC2015?  Malavika Jayaram’s closing thoughts on big data and big bias certainly gives us a lot to chew on, but more importantly to act upon. However, as suggested we need to have a concrete idea about what is the reality of privacy and internet security in the developing world. This would give us a good foundation upon which to build discussions at next year’s event.

As most of the speakers emphasised, open is not enough, and as a community, the earlier we critically consider and tackle the other (potentially dark) side of open, the better we would be at making meaningful impact on our respective projects.

Balancing the Benefits of Openness with the Risks of Privacy

- September 12, 2014 in Uncategorized

 

uaka

 

I attended the UK Anonymisation Network (UKAN) Symposium which was held at the Wellcome Collection in London yesterday, September 11. It was a great opportunity for my colleague Javier Ruiz (of ORG) and I to connect face-to-face with other people doing work in this area since our Open Data & Privacy expert workshop in June and OKFestival 2014 in July.

Aptly titled Anonymisation: Techniques, Risks and Benefits, the symposium saw a healthy debate around both the benefits as well as the risks of anonymisation, and also about open (and big) data generally.

In the opening speeches, it was emphasised that UKAN exists to harmonise practices and knowledge of anonymisation in the country. This aligns well with our own goal on the Open Data & Privacy project, which is to harmonise understanding about the various issues at the intersection of open data, personal data and privacy (where possible!)

The event saw a great participation, with over 150  individuals representing academia and other research groups, both public and private sector data controllers and publishers, and civil society organisations in attendance. The diversity in speakers brought a robustness to all the debates (especially the panel discussions) which was necessary if we are to effectively explore the different aspects of the benefits and risks that are inherent in open data.

While it is not possible to capture all the rich discussions that took place, a couple of points certainly stood out for me which I highlight below.

 

The importance of trust and transparency

The words trust and transparency were repeatedly mentioned, where it was acknowledged that generally, there needs to be a level of trust within all spheres of the data (sharing) environment. For example, Sir Nigel Shadbolt  maintained that privacy considerations matter to the ODI and are taken seriously because engenders at it engenders the trust of their constituents and protects their reputation. This certainly applies also to other actors in the open space such as Open Knowledge and ORG.

Additionally, there were calls for anonymisers to be transparent about the procedures (and techniques) being used so the public can have a trust in the systems in place. The view that data subjects have the right to know the logic behind how data is processed or shared, and what reusers are allowed to do with the data was also advanced. Along a similar vein, there was a call for data breach stories to be shared between organisations especially those that are linked. This atmosphere of trust and transparency is seen as more likely to build the confidence of the public.

 

‘Benefits versus risks’ is an on-going debate

There is often a polarised debate on the issue of  how to handle data so that there is utility but also confidentiality. It was no different at the symposium as both utopian and dystopian perspectives were freely shared. Most of the speakers were of the the view that anonymisation is possible and can be done effectively so that  it protects privacy, while at the at the same time unlocking benefits (of data and information) for a variety of purposes. However, there was also the view that anonymisation can be done badly, depending on several factors.

In the two best practice case studies showcased (by the Department of Energy and Climate Change, and the Department for  Works and Pension) it was fairly apparent that this debate was crucial in getting them to improve their existing processes of anonymisation prior to publishing the data-sets.

 

Many of the issues are not settled

Within the open data, personal data and privacy community, there seems to be an acknowledgement that there is a lack of common understanding about most of the terminologies being used in the space, as well as the relevant laws. For example, the issue of what what exactly constitutes consent was once again highlighted. A few of the discussions also flagged in particular pseudonymisation (and also personal data) which have often been interpreted differently in the various laws. The seeming confusion between what  is prescribed in the various laws, for instance in the data protection regulations (including the proposed Data Protection Regulation), and the ICO’s Code of Practice was identified to be one of the real challenges that actors in the environment are having to deal with.

Jurisdictional disharmony in interpretations of these laws also compounds the issue, especially where data flows and exports are concerned. For instance, member states within the Article 29 Working Party approach things differently. There are also marked differences in the perspectives from different disciplines and sectors which affect how policies (for instance on giving access to data and managing privacy breaches) could be applied, especially for commercial versus research uses of data.

Attendees continuing the discussion after the event

panel1

The policy panel

Role of the UKAN

The continuing debate about balancing openness with privacy makes interdisciplinary bodies such as the UKAN very relevant. Apart from making available the expertise to ensure that anonymisation is done effectively, it also enables a system of checks to be in place as the various institutions act as a check on each other.

As both Sir Nigel Shadbolt and Sir Mark Walport highlighted in their keynotes, the challenges with how to handle all the data from the Internet of Things remain, so interactive meetings such as this are therefore important in providing a critical look at the salient issues. The interdisciplinary and intersectoral Personal Data and Privacy Working Group creates a platform where such conversations can continue. Join the mailing list to keep informed.

Why privacy is less nebulous than it’s sometimes made out to be

- August 13, 2014 in Uncategorized

This guest post is by Walter van Holst.

The complexities of the notion of privacy

Two recurring themes in conversations about privacy and personal data are that privacy is such an abstract concept, and that public data can’t be personal data. The former is a myth, the latter a misunderstanding, sometimes an understandable one. A recommended reading on the false dichotomy between public and private or personal is danah boyd over at Medium.com. She’s a recommended reading anyway, although I disagree with the complexity of privacy as a whole. It ultimately boils down to the notion of agency: how many degrees of freedom do I have left. And not in the hard, non-coercive, sense of the word freedom. Do I feel like I can freely research Jihadist literature on the internet? To look up a medical condition via a search engine? To communicate with someone who is a well-known investigative journalist? Information empowers, which is both a good and a bad thing. Good because it can mitigate existing power differentials and prevent new ones from happening. Bad because it can amplify existing ones or even create new ones.

Where is open or personal data in this mix?

Open data has always been as much about the mitigation and prevention of power differentials as about innovation. In a sense privacy is about the same core values. That this core value is expressed and enshrined in law differently over time and in different cultural contexts is what makes it complex in practice. In the USA, the starting point is the right to be left alone, born from the injustices of British colonial rule. In Europe the core concept is more that of informational privacy, born from the injustices brought about by Nazi and Stalinist rule. Quite unsurprisingly given the way law develops over time, a lot of privacy law has a philosophical underpinning that is dodgy at best. Property, a core concept in any society more complicated than hunter-gatherers, lacked a sound underpinning till the advent of game theory and its application to economics. From that perspective, privacy already is a remarkably mature concept. And speaking about property: for the love of all that is right, let’s stop framing personal data in terms of ownership!

Image credit: AFP

Image credit: AFP

Your personal data is very much like your shadow in that it both reflects you as a person but can also give a distorted reflection of yourself. Your shadow can take on a life on its own, like in the Indonesian Wayang puppet theatre, including all the drama that ensues in that art form. Personal data is data about a person, not owned by that person. Privacy is more than personal data, but in the context of an information society in which everything becomes data, personal data will become more synonymous with privacy than it already is. And we will become very boring people if we are not wary about this and regain the territory that has been lost already!

Image credit: WSJ/Tim Robinson

Image credit: WSJ/Tim Robinson

 

When Open Data and Privacy meet: perspectives from the community

- August 5, 2014 in Uncategorized

The need for open data practitioners to connect more often with those who work with personal data, as well as privacy advocates (as alluded to in previous blog posts on this platform) was underscored by the discussions from June’s Open Data, Personal Data and Privacy workshop in London. This workshop was jointly convened by Open Knowledge and the Open Rights Group.

 

Agenda-hacking session at the workshop

Agenda-hacking session at the workshop

The workshop set out to bring together folks across the open data, personal data and privacy fields. Our goals were to explore where these areas of work intersect and find constructive actions which could be taken to both de-risk these intersections, and to ensure that each field could benefit from a good understanding of the potentials of the others.

At the start of the meeting, it was apparent that many of the terms used (such as: anonymised data, pseudonymised data, shared data and open data) are understood and interpreted differently by different groups and individuals. This is particularly so with regards to open data (versus shared data) of which much has been written and discussed already. For example, Phil Booth (of medConfidential) recently demonstrated through examples of government shared data how this misrepresentation works (on an ODI Lunchtime lecture). It will be important to develop shared understanding of such terminology, and to raise understanding that different fields have different concerns, uncertainties and context.

This coupled with other discussions around privacy concerns in some open datasets led the group to call for a checklist of sorts to guide data publishing behaviour. The agreement and plan towards the development of this checklist, is one of the key intended goals and outputs of the meeting.

 

What is open data? And how do we ensure that opening up data does not breach privacy?

For those in the open data world, essential non-personal infrastructure information has the most value when it is openly shared (as per the Open Definition) and used and useful. There are many datasets where open sharing leads to all kinds of value, from citizen empowerment to enhanced research to economic growth – all without any personal data!  This is a world away from the popular discourse of “data as the new oil”, where high corporate and government valuations are made of primarily personal data.

The workshop reached a general consensus that personal data should never be open data, although with a few exceptions. For example, individuals are increasingly thinking about the possibilities of opening up their own personal data (for example for medical research), and with appropriate consent and awareness of risks, this could be acceptable.   However, there should not be personal data in datasets which governments may mandate should be opened up, as a rule.  Examples of difficult cases include personal information about elected representatives, where more information than just name is required to explore company control for anti-corruption purposes, or where individual expenses/spend data may indirectly include personal information.  Therefore there was a need to think through the challenges that opening up some datasets, in some contexts (such as including country, data collection method, individual consent, public interest, and more) pose for privacy considerations. It was also clear from the discussions that open data and privacy need not be in tension with each other. For example, the group considered also instances when openness (by promoting transparency) of procedures helped to safeguard the privacy of data subjects and users – companies and customers (see a blog post from Reuben Binns with some concrete examples).

 

Key challenges identified

A major challenge is in the arena of the extent to which de-identification (or anonymisation) can be done successfully in order to eliminate risks of re-identification. This remains the on-going debate especially since auditing of anonymisation efforts is not easy. It is important to note that one of the key request conveyed at the meeting was the need for evidence (through established research works) to back the various assertions being made. There were also significant questions around anonymisation as against pseudonymization and how legislations (UK and EU) can help clarify its application in specific contexts. There were calls for a body of knowledge to be developed on these, especially as pseudonymisation- a highly contested technique (though useful in certain instances) can be seen as either posing significant risks to privacy, reducing the usefulness of the datasets, or stripping away the rights of data subjects.

These are all  thorny issues with most people who will like to see more safeguards in place for privacy protection in datasets that contain personal information, as well as personal data enthusiasts advocating for greater access and control to be given to data subjects.

The issues surrounding consent were also discussed. While consent is one of the possible paths to (personal) data publishing and processing, it becomes problematic with open data as for strong consent, one needs to know what specifically is being agreed to. The ability to freely reuse information for any purpose (through open licensing) may leave this very open and hard for informed consent to take place.

Further, there were discussions around  privacy  concerns in project design, and it was found that privacy consideration is not an easy thing to build into projects (where data of some form would invariably be collected). Regardless of later questions of data sharing, or data publication, the overall project design around collection and data handling is key for privacy protections.  This is mainly due to the fact that funders are often separate from implementers, while the technology environment  is also removed from research or policy. Additionally, the universality of the certain standards – to cut across sectors and countries – is in  question.  However, the group made attempts to advance possible solutions by proposing the regulation of data collection; the application of dystopian perspectives to project design;  streamlining incentives systems to favour privacy-centered projects;  as well as the reinforcement of  data minimisation principles.

 

What are the solutions in place, and the next steps after the workshop?

Finally, it was decided that there are indeed some useful resources, tools and guidelines already in place in relation to the various topics on personal data and privacy. However, what is lacking is accessibility to them, and a better way of communicating. Towards this end, it was decided to have the following as intended outputs of the workshop:

1) A ‘talking points’ guide for communicating on issues in open data and privacy (including documenting both harm and success stories)

2) An introduction to anonymisation (techniques)

3) A resources toolkit (with a glossary of terms and a list of research papers to support evidence-building)

4) A checklist for open data publishing and reuse (that gives a consideration to personal data and the issues surrounding privacy concerns)

5) A study report on personal data licensing and the potential for individuals to share their own information.

 

Hence, over the course of the next few months, the project will be working (with the relevant experts) to develop the above-named products, which will be disseminated and communicated widely.

 

On the whole, the workshop was very positively received. One participant noted in the evaluation form

“ I thought it would be just a talking workshop/didn’t really know what to expect from the pre-event information, but the event was more informative and helpful than traditional powerpoint presentations would have been”.

There are other key takeaways from the workshop, which have been captured in the workshop notes on the Wiki  and Webpage.

 

Photo credit: Dirk Slater

 

 

 

 

 

 

The privacy, ethical and security concerns of open discussed at OKFestival 2014

- July 25, 2014 in Uncategorized

Festival bannerOKFestival 2014 was a phenomenal event during which over a 1,000 individuals gathered to discuss and share ideas around the transformative power of open knowledge (information, data etc). However, also on the agenda was the crucial issue of how to ensure and enhance privacy and security in an increasingly open landscape. A number of sessions during the main festival as well as some  fringe events tackled various aspects of  this issue. Here is a recap of some of them.

 

Fringe events

  • The Open Data Control: Convergence and Hack fringe event (aptly named Convergathon) took place on the 12th and 13th of July in Berlin with satellite events held simultaneously  in Tel Aviv and San Francisco. It was led by Mark Lizar of Open Notice and Reuben Binns. Day one featured talks and workshops on Personal Data control infrastructure and systems by Eve Maler, Doc Searls and also Rufus Pollock of Open Knowledge, among others. The sprint (which saw a team hacking around a specific tool) concluded events for day two. We consider this a good first step and look forward to a bigger collaborative event next year.
  • On Tuesday the 15th, a networking event was hosted by the  MyData Working Group of Open Knowledge Finland and the Finnish Institute in Berlin. Foremost among the challenges that were raised by those present were the concerns around who owns your data, how  personal data is abused by corporations and how having control over ones data does not guarantee immunity from  privacy violations. Therefore, some calls were made for a greater understanding of how open works and how it is validated (evidenced by  the benefits outweighing the risks).
  • The Open Development community’s fringe event took place at the Wikimedia Deutschland on Friday July 18th. A sub-session on privacy and protection and how to analyse risk of open data in the developing world was led by  Zara Rahman and Linda Raftree. It emerged from the discussions that though privacy violations are comparatively more severe in this area, most systems are ill-equipped to deal with them. For example, individuals identified from health and social care data datasets as belonging to a particular group in some countries have to live in fear of being assaulted or killed. Participants traced the security and privacy concerns at each stage of the information sharing loop- looking at citizens,  infomediaries, and governments.

My dataRisk

Sessions at main festival

  • The first of privacy sessions at the main festival took place on Wednesday where Ulrich Atz and Kathryn Corrick of the  ODI took participants through a hands-on anonymisation exercise using data from the Titanic ship passenger list. Discussions also centered around the different ways de-anonymisation can be possible with any particular dataset. Some opinions were that anonymisation is rarely ever fool-proof and therefore privacy protection cannot be entirely guaranteed but what is important is to establish what is the value to be gained from the information contained in the released dataset and to weigh this against the risks before doing so.

 

  • In the Can Open Data Can Go Wrong? session, Javier Ruiz of the Open Rights Group and Reuben Binns both shared experiences of the UK NHS care.data program and how it showcased that open data can be entirely misconstrued. While most of the harm stories spanned different cases and contexts, the underlying theme was that open data and information needs to be handled responsibly and ethically.

 

  • On  Thursday, Javier Ruiz together with Fabrizio Scrollini and Renata Avila led discussions on the understanding of the surveillance systems in place in different parts of the world and how the related issues of ethical and privacy  concerns it raises can be tackled. While a few participants emphasized how useful encryption tools can be in managing this, others noted that surveillance is very much a political issue which  requires diplomatic and strategic institutional collaboration (beyond just the technology).

 

  • Taking Privacy Considerations Forward was the final privacy-related session for the main festival and was facilitated by Sally Deffor, Javier Ruiz, Walter van Holst and Christopher Wilson of the Personal Data and Privacy Working Group. Participants explored and validated the underlying issues and solutions surrounding personal data licensing, anonymisation techniques, control of personal data, among others which have been determined beforehand in a previous workshop. Among the key recommendations made was for clarity on when open personal data (and by inference anonymisation) is necessitated and when it is not. There was also the recognition that regulations on disclosure and privacy are not consistent across different contexts, and this needs to be considered when crafting communicative resources.

Hate data

Evidently, the attention on how privacy and security is handled in the open environment is only going to increase and there are a number of activities post OKFestival 2014 including the Open Development Camp 2014 and the  Responsible Data Forum that are going to explore the issues further and collaborate on some specific actions. The work of the Personal Data and Privacy WG is also continuing with the next group call planned for mid August to undertake a resources sprint. Do sign-up if you wish to be involved!

 

Photo credits: Marieke Guy & Gregor Fischer