What do they know about me? Open data on how organisations use personal data
Guest - March 18, 2014 in Featured
This post is by Reuben Binns, a postgraduate researcher at the University of Southampton, Web Science Institute. His research interests include ethical and legal aspects of personal data and open data. Find him on Twitter and GitHub.
When open data and personal data collide, attention is quite rightly drawn to the negative implications for privacy; namely, the possibility that open data contains – or can be used to infer – personal data. But there’s also a flip-side; open data could help protect privacy by revealing the activity of those who collect and share our personal data. This is something I’ve been exploring in my research using the UK Register of Data Controllers.
This dataset, covering the data protection notifications of 350,000 UK organisations, is released by the Information Commissioner’s Office under an Open Government License (it’s available by DVD on request from the ICO, and can be searched using their website portal). It discloses why organisations collect personal data, what kinds of data they collect, from whom and who has access to it. My research uses snapshots of this data over a 3 year period to paint a picture of the UK personal data landscape – who knows what about who, and why. Of course, some of this data may be inaccurate or incomplete, but it’s compiled from what organisations themselves are legally obliged to disclose to the ICO. The raw XML was parsed and loaded it into a database which can be queried. The full results will be released in a forthcoming paper, but alongside this, I’ve also been experimenting to see how the data could provide context to some of the privacy stories that have been in the media spotlight in recent years.
One example is the ongoing ‘construction worker blacklist’ fiasco. The Consulting Association, a rather blandly named outfit, were fined by the ICO for compiling a blacklist of over 3000 construction workers. Employers paid for access to the list in order to screen out potential workers who had previously caused ‘trouble’ – by, for instance, raising safety concerns on site or engaging in trade union activity. Some of the blacklisted workers were unable to find work for years and are now seeking compensation.
What’s ironic – and alarming – about this case and others like it is that the potentially harmful activity often isn’t itself prohibited by law. In the end, the £5,000 fine was issued due to the Consulting Association’s failure to register their activity with ICO. The truth is, even legal activity that regulators are aware of may still endanger privacy. So I dug into the register to find companies openly claiming to engage in similar practices.
I found 422 organisations who claim to be collecting information about the trade union membership status of employees of other organisations, for the purposes of selling it to third parties. This was essentially the business model of the now defunct Consulting Association. I’ve visualised a sample of 42 of these organisations below – the yellow nodes are the categories of third parties with whom they share this data.
See full image here.
A more recent controversy concerns the use of patient health data. In the debate over the proposed care.data scheme – under which medical records currently held by GP’s would be aggregated into a central database and made available to researchers and companies outside the NHS – it emerged that identifiable patient data from hospitals has apparently already been sold (indirectly) to insurance companies, to the shock and dismay of privacy campaigners and health professionals alike. The body responsible, the HSCIC, have an entry in the register stating who they share personal data with – a copy of which can be seen by searching their registration number (Z8959110) in the ICO’s public portal. (NB: no mention of insurance companies).
A query for organisations who are collecting health data for ‘health administration and services’ purposes returns over 57,000 results. We can refine this to show only those organisations who give this data to ‘traders in personal data’, which yields 840 matches. Many of these appear to be opticians – branches of ‘Specsavers’ make up about a third – so if you’ve had an eye test lately, the results have possibly been aggregated up and sold through third parties. But there also appear to be some other health providers in there with potentially more sensitive data; one of them is an NHS Trust specialising in mental health. There may be a perfectly legitimate and ethical reason why they’re giving away patient data to private data brokers – but I’m struggling to guess what that could be.
Real privacy harms could result from these kinds of data sharing arrangements, even when they don’t contravene data protection law. If I were a member of a trade union, and my employers had any relationship with those 422 companies, I’d want to know about it. If I were a user of an NHS mental health service, I’d want to know if they’re sharing my medical data with data brokers and why. Whether it’s employment history, political affiliations, or health records, authoritative and accurate open data on who knows what about who is a pre-requisite for preventing privacy harms before they arise.
Publishing this information in obscure, unreadable and hidden privacy policies and impact assessments is not enough to achieve meaningful transparency. There’s simply too much of it out there to capture in a piecemeal fashion, in hidden web pages and PDFs. To identify the good and bad things companies do with our personal information, we need more data, in a more detailed, accurate, machine-readable and open format. In the long run, we need to apply the tools of ‘big data’ to drive new services for better privacy management in the public and private sector, as well as for individuals themselves.
So while there are genuine tensions between openness and privacy, there are also harmonies. When it comes to the organisations, businesses and institutions that shape our lives and livelihoods, transparency about how they use our personal data is essential. It’s the first step towards a new privacy infrastructure fit for the digital age – and open data has a crucial part to play.
Further links:
See the github project report for more on the data source itself – contributions / forks are very welcome. See my previous thoughts on how openness can help rather than hinder privacy here and here, and my musings on the care.data scheme shortly before it was postponed.