EXCLUSIVE - In-progress UN report on Big Data and Open Data - Defining issues and mitigating risks
Earlier this year, OpenGov spoke to Adjunct Professor David Watts, Commissioner for Privacy and Data Protection for the State of Victoria in Australia to learn about the Victorian Protective Data Security Framework.
In July, Commissioner Watts was appointed by United Nations (UN) Special Rapporteur on the Right to Privacy’s lead on a Big Data and Open Data study. OpenGov caught up with him to discuss his new role, and the objectives and scope of the study. Commissioner Watts talked about the importance of defining the terms, studying the risks and mitigation strategies.
Can you tell us about your new role as the lead on the UN global study on Big Data and Open Data?
Let me start with some background. The UN Human Rights Council’s Special Rapporteur on the Right to Privacy, Professor Joe Cannataci, has a mandate to “raise awareness concerning the importance of promoting and protecting the right to privacy, including with a view to particular challenges arising in the digital age, as well as concerning the importance of providing individuals whose right to privacy has been violated with access to an effective remedy, consistent with international human rights obligations”.
In his first report to the Human Rights Council in March 2016, the SRP announced his intention to focus on a number of significant privacy themes. One of those being Big Data and Open Data. The aim of the project I am leading is to produce a report on Big Data and Open Data for the Special Rapporteur for presentation to the UN Human Rights Council and General Assembly in the latter part of 2017.
We have divided the Big Data/ Open Data theme into a number of areas of inquiry. The first area of inquiry relates to how best to frame the issues. The starting point is to define both Big Data and Open Data.
Interestingly, there is no agreed definition of Big Data. There are only descriptions. People talk about the 3, 4 or 5 Vs. When last I counted it was 8 Vs. Even the National Institute of Standards and Technology (NIST) in the US which attempted a definition, only arrived at a description. The lack of a definition is a coneptual issue.
Defining Open Data is easier. The definition that I find most useful is that Open Data is data that can be freely used, reused and redistributed by anyone, subject only at most to a requirement to attribute and to share alike.
Open Data has become a mantra for many governments, including UK, USA, Australia and Singapore. They believe it will stimulate research and development, drive innovation and generate knowledge to improve society. But there are risks. One of the most significant is the risk of re-identification of deidentified data. For example, Australia’s Department of Health released over a billion lines of “deidentified” personal health data a few weeks ago but had to withdraw it shortly afterwards because Melbourne University researchers re-identifed it.
How would success be evaluated?
The extent to which the paper provides the global community with a basis for a sensible and informed debate.
By our ability to produce a report that defines and demystifies the issues, makes sense of them, that identifies the risks and opportunities and points to solutions.
Who are the major stakeholders you would want to involve in this?
The Special Rapporteur has indicated that he wishes to be everybody’s Special Rapporteur. He has indicated he will be inclusive and is interested in all points of view. So, contributions from civil society, the academy, government, the private sector, non-government organisations, special interest-groups like health consumers all have an important role to play.
Broad engagement is needed. Often the debates around privacy become Euro-centric or US-centric. It is very important to understand Asian perspectives.
The Asia-Pacific region is growing very rapidly and is frequently at the forefront of developing new services, new ICT technologies and new approaches to data.
What is the scope of the report?
The report will be divided into parts:
- Framing the issues –Defining Big Data, Open Data , Structured and unstructured data
- Technologies and processes - Analytics, what the process is for management and analysis, and interpretation, algorithms, data linkage, distributed ledger, privacy enhancing technologies etc.
- Participants - public sector, private sector, Information intermediaries and what each seeks to do with Big Data and Open Data
- Value - public value, private value and individual value
- The Ethical, legal and regulatory context
- Addressing the risks
The difficulty with the subject matter is that it moves so fast. This means that our understanding of new technologies must constantly evolve. One issue I have noticed is that much of the discourse about big data is contested A key task will be to sift what is authoritative from what is not.
What are some of the key benefits and risks in your view?
Although some say that big data’s benefits are unproven, there appears to be a broadening consensus that it can produce insights: for example evidence relevant to some of our most complex social and economic problems such as climate change, healthcare, and to address serious crime and corruption.
In Africa, mobile telephone records were used to track Ebola. Frequently, the disease vectors could be tracked more effectively this way than by reports from healthcare facilities.
The challenge is how to harness these benefits while protecting privacy. We need to ensure that these techniques do not become so intrusive that we have no personal and private space to ourselves. Drawing global attention to these issues is an important part of the project.
As I mentioned earlier, one of the most controversial issues regarding Big Data - Open Data is whether personal informaiton can ever be re-identified.
If you assemble various government data sets, or combine them with other commercially purchased data, then there is a significant risk of the data being re-identified. Blanket assertions that released data has been de-indentified are problematic. The projects de-identification focus will be evidence-based to provide an opportunity to test their claims of de-identification solutions.
Can you tell us about some of the mitigation measures you are looking at?
Non-exhaustively, we are looking at Privacy Enhancing Technologies, de-identification, differential privacy, distributed ledger technology, the semantic web1, and other technological approaches together with security controls, legal controls at the national and international level, business processes, policies and standards, frameworks, governance, structures, capability, transparency and resourcing.
I am sceptical about technological solutions that assert that they are going to be perfect in every circumstance.
For example, at the moment the jury is out over whether differential privacy works. Differential privacy involves the preservation of data sets, sufficient to answer queries accurately, but reducing the likelihood of disclosing personal information. Some say that the noise added to the data makes it useless. Others say that the noise that is added to the data is protective of personal information. Which claims stand up to scrutiny?
The Semantic web can be used to constitute permissions that would accompany information, from which the automation of informaiton transactions can occur. Perhaps personal information coud be stored in the equivalent of an electronic envelope, associated with an application that can negotiate how the information can be used through smart contracting.
Distributed ledger technology has been mooted as being able to protect privacy and security. Distributed ledger implementations such as blockchain show good potential to make transactions accountable, transparent and secure to anyone who can read the blocks in the chain. But in many cases you may not want your private personal information to be transparent. Maybe there is an encryption solution for this. These are early days. My feeling is, it is a technology that has a range of potential benefits, probably some of them haven’t been invented yet. Again, we need to examine the claims and test them.
Technology moves quickly. So, to underpin regulatory and oversight efforts on one or a suite of technologies seems perilous to me. I think technologies can be used as ways to implement broader policies and regulatory frameworks. It’s important to have principles, standards, frameworks and structures that are risk-based and technology neutral.
What are your thoughts on data ownership?
When different professions and disciplines communicate, they face terminology problems. As a lawyer I understand confidentiality to mean a certain thing. But security professionals have a slightly different meaning and policy professionals have yet another meaning.
Data ownership is one of those expressions. Legally the term doesn’t make sense. There is no property right in information per se. ICT professionals understand data ownership as a stewardship issue that goes to who has management responsibility and who is accountable for the data.
So I think “data ownership” has to be used with care.
It’s better to manage risk against the indicators of accountability and responsibility, rather than trying to work out who owns data.
1 The Semantic Web is a set of data standards promoting common data formats and exchange protocols on the web.