Issues with data reuse

Most discussions on open data and privacy look at the role of the publisher, but reusers also have separate responsibilities.

There is a site in the US, Mugshot.com that collects mugshots from police websites, ostensibly for information purposes. But if you find yourself there you have to pay to get taken down. This brings lots of ethical questions about the reuse of publicly available data.

The UK also has some police mugshots online, but they were taken down from the main website after privacy concerns.

Many police forces have online mugshots, but have certain criteria: offences that carries minimum sentence, permission from the victim, tags for google not to index, etc. There have been surveys on public attitudes, with a positive response overall.

Police try to use licensing conditions to prevent misuse.

The US and EU have very different frameworks and attitudes.

After the right to be forgotten case involving Google, how can other data re-users cope? They could be deemed data controllers, with a huge set of obligations.

Is there a need for search engine exemption? how far could this go? what about search engine for paedophiles in the area?

Measures to control reuse

How can we deal with this technically? Can we set a timer on data with expiry dates, even if open licensed?

Can you set a sunset clause in the license?

There are no useful examples where data has been timed out. For example, Techcrunch changed their api from cc-by to cc-by-nc. Because it was an api, new data requests re-users to adopt the new license, but not retrospectively.

Internet architecture needs to reflect control of original subjects, for example, with spam. But at the same time anonymity is important. Openness needs a more humanist dimension. All internet involves personal information, social/human over TCP/IP, but for practical purposes we nee to focus on “more personal” or sensitive aspects.

Reuben Binns provided an example of added privacy considerations in his project on the database of Data Controllers from the UK Information Commissioner. The data has the following condition:

The register will be provided under the Open Government License and may be reused provided that the reuse of any personal data complies with the requirements of the Data Protection Act and, in particular, that such data are not used in a way that is inconsistent with the purpose for which the register was created, for example not used for direct marketing, or in a way that otherwise adversely affects the privacy of individuals.

Can we add nested and structured considerations?

Daniel Weitzner has worked on this area of “privacy design strategies for open information networks”.

Sticky policies are machine readable and “stick to data to define allowed usage and obligations as it travels across multiple parties, enabling users to improve control over their personal information”. These have not proved scalable to the wide world, but in open data they could work.

But the fundamental question remains: who is more responsible publisher or re-user? we need both to be helpful and responsible.

Can we bring it back to the human level? enforcement of proper reuse through reputation?

In Linked Data, could reporting bad reuse become semantic ostracism?

What about revoking access?

Are APIs a better way to enforce privacy than downloads?

get w3c involved?

Organisational Models

Who can be trusted with data?

network effects create information monopolies, but we need different industrial structures

cloud providers need some trust, reputation

looked at Respect Network

trust tech, institutional structure and people behind

what about cooperatives?

inherent more trust on non-profit, but sometimes sense of purpose and self-righteousness can be problem for privacy, as it justifies intrusive practices. See political campaigners and unions use of voters lists, etc.

Yet, the public doesn’t trust commercial companies

Ethical Re-use of Data

The reuse of open data should be also open. Business rules and processes used in analyses that use open data, not just derivative data, tools and visuals.

Is a notification about reuse desirable? should I know who is watching? LinkedIn system?

There could be an issue of investigative journalists

Ultimately it’s all about power

In Finland, the Information Commissioner said it’s OK to search in Google if you are hiring someone.

It depends on what you add to the information: profile, enrichment

There is a high threshold to justify becoming a new data controller.