You are browsing the archive for Laura James.

Open Data Privacy

- December 13, 2013 in Featured

“yes, the government should open other people’s data”

Traditionally, the Open Knowledge Foundation has worked to open non-personal data – things like publicly-funded research papers, government spending data, and so on. Where individual data was a part of some shared dataset, such as a census, great amounts of thought and effort had gone in to ensuring that individual privacy was protected and that the aggregate data released was a shared, communal asset.

But times change. Increasing amounts of data are collected by governments and corporations, vast quantities of it about individuals (whether or not they realise that it is happening). The risks to privacy through data collection and sharing are probably greater than they have ever been. Data analytics – whether of “big “ or “small” data – has the potential to provide unprecedented insight; however some of that insight may be at the cost of personal privacy, as separate datasets are connected/correlated.

Medical data loss dress

Both open data and big data are hot topics right now, and at such times it is tempting for organisations to get involved in such topics without necessarily thinking through all the issues. The intersection of big data and open data is somewhat worrying, as the temptation to combine the economic benefits of open data with the current growth potential of big data may lead to privacy concerns being disregarded. [Privacy International](https://www.privacyinternational.org) are right to [draw attention to this in their recent article on data for development](PI), but of course other domains are affected too.

Today, we’d like to suggest some terms to help the growing discussion about open data and privacy.

Our Data is data with no personal element, and a clear sense of shared ownership. Some examples would be where the buses run in my city, what the government decides to spend my tax money on, how the national census is structured and the aggregate data resulting from it. At the Open Knowledge Foundation, our default position is that our data should be open data – it is a shared asset we can and should all benefit from.

My Data is information about me personally, where I am identified in some way, regardless of who collects it. It should not be made open or public by others without my direct permission – but it should be “open” to me (I should have access to data about me in a useable form, and the right to [share it myself, however I wish](mydata) if I choose to do so).

Transformed Data is information about individuals, where some effort has been made to anonymise or aggregate the data to remove individually identified elements.

big-data_conew1

We propose that there should be some clear steps which need to be followed to confirm whether transformed data can be published openly as our data. A set of privacy principles for open data, setting out considerations that need to be made, would be a good start. These might include things like consulting key stakeholders including representatives of whatever group(s) the data is about and data privacy experts around how the data is transformed. For some datasets, it may not prove possible to transform them sufficiently such that a reasonable level of privacy can be maintained for citizens; these datasets simply should not be opened up. For others, it may be that further work on transformation is needed to achieve an acceptable standard of privacy before the data is fit to be released openly. Ensuring the risks are considered and managed before data release is essential. If the transformations provide sufficient privacy for the individuals concerned, and the principles have been adhered to, the data can be released as open data.

We note that some of “our data” will have personal elements. For instance, members of parliament have made a positive choice to enter the public sphere, and some information about them is therefore necessarily available to citizens. Data of this type should still be considered against the principles of open data privacy we propose before publication, although the standards compared against may be different given the public interest.

This is part of a series of posts exploring the areas of open data and privacy, which we feel is a very important issue. If you are interested in these matters, or would like to help develop privacy principles for open data, join [the working group mailing list](http://lists.okfn.org/mailman/listinfo/mydata-open-data). We’d welcome suggestions and thoughts on the mailing list or in the comments below, or talk to us and [the Open Rights Group](http://www.openrightsgroup.org/), who we are working with, at [the Open Knowledge Conference](http://okcon.org) and other events this autumn.

My Data & Open Data

- December 13, 2013 in Featured

The Open Knowledge Foundation believes in open **knowledge**: not just that some data is open and freely usable, but that it is **useful** – accessible, understandable, meaningful, and able to help someone solve a real problem.

A lot of the data which could help me improve my life is data about me – “MyData” if you like. Many of the most interesting questions and problems we have involve personal data of some kind. This data might be gathered directly by me (using my own equipment or commercial services), or it could be harvested by corporations from what I do online, or assembled by public sector services I use, or voluntarily contributed to scientific and other research studies.

Tape library, CERN, Geneva 2

Image: “Tape library, CERN, Geneva 2″ by Cory Doctorow, CC-BY-SA.

This data isn’t just interesting in the context of our daily lives: it bears on many global challenges in the 21st century, such as supporting an aging population, food consumption and energy use.

Today, we rarely have access to these types of data, let alone the ability to reuse and share it, even when it’s **my data**, about just me. Who owns data about me, who controls it, who has access to it? Can I see data about me, can I get a copy of it in a form I could reuse or share, can I get value out of it? Would I even be allowed to publish openly some of the data about me, if I wanted to?

**But how does this relate to [open data](https://okfn.org/opendata/)?** After all, a key tenet of our work at the Open Knowledge Foundation is that personal data should **not** be made open (for obvious privacy reasons)!

However there are, in fact, obvious points where “Open Data” and “My Data” connect:

* MyData becomes Open Data (via transformation): Important datasets that are (or could be) open come from “my data” via aggregation, anonymisation and so on. Much statistical information ultimately comes from surveys of individuals, but the end results are heavily aggregated (for example, census data). This means “my data” is an important source but also that it is essential that the open data community have a good appreciation of the pitfalls and dangers here – e.g. when anonymisation or aggregation may fail to provide appropriate privacy.

* MyData becomes Open Data (by individual choice): There may be people who want to share their individual, personal, data openly to benefit others. A cancer patient could be happy to share their medical information if that could assist with research into treatments and help others like them. Alternatively, perhaps I’m happy to open my household energy data and share it with my local community to enable us collectively to make sustainable energy choices. (Today, I can probably only see this data on the energy company’s website, remote, unhelpful, out of my control. I may not even be able to find out what I’m permitted to do with my data!)

* The Right to Choose: if it’s **my data**, just about me, I should be able to choose to access it, reuse it, share it and open it if I wish. There is an obvious translation here of key [Open Data principles](http://opendefinition.org/) to MyData. Where the Open Definition states that material should be freely available for use, reuse and redistribution by anyone, we could think that my data should freely available for use, reuse and redistribution by **me**.

We think it is important to explore and develop these connections and issues. The Open Knowledge Foundation is therefore today **launching an Open Data & MyData Working Group**. Sign up here to participate:






This will be a place to discuss and explore how open data and personal data intersect. How can principles around openness inform approaches to personal data? What issues of privacy and anonymisation do we need to consider for datasets which may become openly published? Do we need “MyData Principles” that include the right of the individual to use, reuse and redistribute data about themselves if they so wish?

## Appendix

There are plenty of challenging issues and questions around this topic. Here are a few:

### Anonymization

Are big datasets actually anonymous? Anonymisation is incredibly hard. This isn’t a new problem (Ars Technica had a [great overview][ars] in 2009) although it gets more challenging as more data is available, openly or otherwise, as more data which can be cross-correlated means anonymisation is more easily breached.

### Releasing Value

There’s a lot of value in personal data – [Boston Consulting Group claim €1tn][ftvalue]. But even BCG point out that this value can only be realised if the processes around personal data are more transparent. Perhaps we can aspire to more than transparency, and have some degree of personal control, too.

### Governments

Governments are starting to offer some proposals here such as “MiData” in the UK. This is a good start but [do they really serve the citizen][TH1]?

There’s also some [proposed legislation][midatalaunch] to drive companies to give consumers the right to see their data.

But is access enough?

The consumer doesn’t own their data (even when they have “MiData”-style access to it), so can they publish it under an open licence if they wish?

### Whose data is it anyway?

Computers, phones, energy monitors in my home, and so on, aren’t all personal to me. They are used by friends and family. It’s hard to know whose data is involved in many cases. I might want privacy from others in my household, not just from anonymous corporations.

This gets even more complicated when we consider the public sphere – surveillance cameras and internet of things sensors are gathering data in public places, about groups of independent people. Can the people whose images or information are being captured access or control or share this data, and how can they collaborate on this? How can consent be secured in these situations? Do we have to accept that some information simply cannot be private in a networked world?

(Some of these issues were raised at the Open Internet of Things Assembly in 2012, which lead to a [draft declaration][iot]. The declaration doesn’t indicate the breadth of complex issues around data creation and processing which were hotly debated at the assembly.)

### MyData Principles

We will need **clear principles**. Perhaps, just as the Open Definition has help clarify and shape the open data space, we need analogous “MyData” Principles which set out how personal data should be handled. These could include, for example:

* That my data should be made available to me in machine-readable bulk form
* That I should have right to use that data as I wish (including using, reusing and redistribution if I so wish).
* That none of my data (where it contains personal information) should be made open without my full consent.

[PODWG]: http://lists.okfn.org/mailman/listinfo/open-personal-data

[PSdys]: http://www.philipsheldrake.com/2011/11/the-uk-takes-a-step-closer-to-streams-banks/

[TH1]: http://blog.ouseful.info/2013/01/14/midata-is-intended-to-benefit-whom-exactly/

[TH2]: http://blog.ouseful.info/2012/11/20/so-what-midata-and-yourdata-ourdata/

[midatalaunch]: http://news.bis.gov.uk/Press-Releases/New-power-to-boost-consumers-access-to-data-68373.aspx
[mydex]: http://mydex.org/
[pdec]: http://pde.cc/
[PC]: http://www.horizon.ac.uk/Recently-Completed-Projects/Personal-Containers
[ars]: http://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/

[iot]: http://bit.ly/openiot

[ftvalue]: http://www.ft.com/cms/s/0/5fd7d8a8-28e5-11e2-b92c-00144feabdc0.html#axzz2H2to4ovO