Last year, the UN Human Rights Council’s Special Rapporteur on the right to privacy, Professor Joe Cannataci, initiated a study on big data and open data, looking at the task of reconciling the societal benefits offered by new information and communications technologies with the protection of fundamental rights such as the right to privacy.
In October 2016, OpenGov had spoken to Mr. David Watts who is leading the study to learn about its objectives and scope. He is Adjunct Professor of Law at La Trobe University and Deakin University. Until 31 August 2017 he was Commissioner for Privacy and Data Protection for the State of Victoria, Australia.
Recently, the Special Rapporteur’s full report to the UN General Assembly was released, which raises concerns about a vacuum in international law concerning surveillance and privacy in cyberspace. This had to be addressed to protect the rights of billions of citizens, said the Special Rapporteur, who is working on proposals for recommendations or a fully-fledged treaty to close the gap. The report is open for public consultation for the next six months and an international conference will be held in Australia in March 2018 to discuss its preliminary conclusions.
The report notes that new methods of collecting and analysing data – the phenomenon of big data – and the increasing willingness of Governments across the world to publicly release personal information they hold, albeit in de-identified form, in order to generate economic growth and stimulate scientific research – the phenomenon of open data – challenge many of the assumptions that underpin our notions about what privacy is, what it entails and how best to protect it.
On one side there are claims that big data offers the means to develop new insights into intractable public policy issues such as climate change, the threat of terrorism and public health. At the other end of the spectrum are who are troubled by the increasing surveillance by State and non-state actors, unjustified intrusion into the private sphere and the breakdown of privacy protections.
There is data created by individuals through their own agency. It includes emails, text messages, as well as images and videos. Other data is created about individuals by third parties, but in circumstances where they have participated, at least to some extent, in its creation, for example electronic health records or ecommerce transactions.
There is still more data generated behind the scenes, in circumstances that are opaque and largely unknown – and unknowable – to the individuals involved. It consists of ‘digital bread crumbs,’ electronic artefacts and other electronic trails left behind as a product of people’s online and offline activities.
This data can encompass times and locations when mobile devices connect with mobile telephone towers or GPS satellites, website visit records, or images collected by digital CCTV systems.
In addition, data can be temporal, spatial, or dynamic; structured or unstructured; information and knowledge derived from data can differ in representation, complexity, granularity, context, provenance, reliability, trustworthiness, and scope. Data can also differ in the rate at which they are generated and accessed
The report also notes that though algorithms are nothing new, they are now a crucial part of information societies, increasingly governing ‘operations, decisions and choices previously left to humans.’ the recommendations and decisions that result from algorithmic processing appear to spring from black boxes.
Their arithmetical construction might give them an appearance of objectivity, but the values embodied by algorithms often reflect cultural or other assumptions of the software engineers who design them and embed them within the logical structure of algorithms as unstated opinions.
Moreover, data fuels algorithms, but not all data is accurate, sufficiently comprehensive, up-to-date or reliable. Even if the provenance of some data, for example taxation records, can readily be established, their accuracy may vary from taxation agency to taxation agency within one state and between states. Other data may have been drawn from antiquated databases never properly cleansed or from insecure sources or where there have been inappropriate data entry and record-keeping standards.
Information privacy laws
The report highlights that the Organisation for Economic Co-operation and Development (OECD) published its Guidelines on the Protection of Privacy and Transborder Flows of Personal Data in 1980. The eight principles in the OECD Guidelines, together with the similar principles found in the 1981 Council of Europe’s (CoE) Data Protection Convention and the 1990 Guidelines for the regulation of computerized personal data files have informed information privacy laws across the world.
The foundational principle found in both the OECD and CoE rules, the collection limitation principle, is that personal information should only be collected lawfully and fairly and, where appropriate, with the knowledge and consent of the individual concerned.
The purpose limitation principle requires that the purpose of the collection of personal information should be specified at the time of collection and that the subsequent use of the information should be limited to the purpose of collection or a compatible purpose and that these should be specified whenever there is a change of purpose.
The use limitation principle restricts the disclosure of personal information for incompatible purposes except with the individual’s consent or by legal authority.
Big data challenges these principles while posing ethical issues and social dilemmas arising from the poorly considered use of algorithms. Rather than solving public policy problems, there is a risk of unintended consequences that undermine human rights such as freedom from all forms of discrimination.
According to the report, open data and open government were intended to provide access to data about the government itself and the world we live in. It was not intended to include data that governments collect on citizens. In recognition of this, some jurisdictions explicitly exclude ‘personal’ and other categories of information, such as commercial or Cabinet in Confidence information, from Open Data.
It is important not to lose sight amidst terminology such as ‘sharing’ and ‘connecting’, that a reversal has occurred. Rather than releasing data about how government works and which the public can use to hold government to account, governments are releasing data about their citizens.
It is claimed that ‘value’ is locked away in government databases or other information repositories and making this information available publicly, will encourage research and stimulate the growth of the information economy.
The community’s level of trust in government strongly shapes how they view the possible impact of open data and open government initiatives. Those who trust government are far more likely to think that there are benefits to open data.
But open data that is derived from personal information relies on the efficacy of ‘de-identification’ processes to prevent the re-identification and linkage back to the individual from whom it was derived.
Patterns in the data, without the names, phone numbers or other obvious identifiers, can be used to identify a person and hence to extract more information about them from the data.
So, it needs to be considered if de-identification processes deliver data that does not interfere with individuals’ information privacy rights’. Simple kinds of data, such as aggregate statistics, are amenable to genuinely privacy-preserving treatment such as differential privacy. Differential privacy algorithms work best at large scales, and are being incorporated into commercial data analysis.
High-dimensional unit-record level data cannot be securely de-identified without substantially reducing its utility. Examples would be medical records and web logs. (High dimensional refers to datasets with large number of attributes or features,while unit-record is information relating to an individual person )
The report notes that there are numerous examples of successful re-identification of individuals in data published by governments. This ‘public re-identification’ is public in two senses: the results are made public, and re-identification uses only public auxiliary information.
The re-identifiability of open data is an indication of a much larger problem – the re-identifiability of “de-identified” commercial datasets that are routinely sold, shared and traded.
The Special Rapporteur is considering several recommendations for a more final version of this report to be published in or after 2018.
One of them is the requirement of clear statements of the limits to using personal information based on international standards and principles, for Open Data policies. This would include an exempt category for personal information with a binding requirement to ensure the reliability of de-identification processes, and robust enforcement mechanisms.
Another recommendation is that any open government initiative involving personal information, whether de-identified or not, should require a rigorous, public, scientific analysis of the data privacy protections including a privacy impact assessment.
Sensitive high-dimensional unit-record level data about individuals should not be published online or exchanged unless there is sound evidence that secure de-identification has occurred and will be robust against future re-identification.
Frameworks should be established to manage the risk of sensitive data being made available to researchers. Governments and corporations should actively support the creation and use of privacy-enhancing technologies.
The report says that the following are to be considered when dealing with Big data:
Governance: responsibility (identification of accountabilities, decision-making process and as appropriate, identification of decision makers); transparency (what occurs, when and how to personal data prior to it being publicly available, and its use, including ‘open algorithms’); quality (minimum guarantees of data and processing quality); predictability (when machine learning is involved, the outcomes should be predictable); security – appropriate steps to be taken to prevent data inputs and algorithms from being interfered with without authorisation; f. develop new tools to identify risks and specify risk mitigation; support (train employees on legal, policy and administrative requirements relating to personal information.)
Regulatory environment: 1) Ensure arrangements to establish an unambiguous focus, responsibility and powers for regulators charged with protecting citizens’ data; 2) Regulatory powers to be commensurate with the new challenges posed by big data for example, the ability for regulators to be able to scrutinise the analytic process and its outcomes; 3) Examination of privacy laws to ensure these are ‘fit for purpose’ in relation to the challenges arising from technology advances such as machine-generated personal information, and data analytics such as de-identification.
Inclusion of feedback mechanisms: Formalise consultation mechanisms, including ethics committees, with professional, community and other organisations and citizens to protect against the erosion of rights and identify sound practices; and undertake a broadbased consultation on the recommendations and issues raised by this report such as the appetite, for example, for prohibition on the provision of government datasets.
Research: Investigate relatively new techniques such as differential privacy and homomorphic encryption to assess if they provide adequate privacy processes and outputs; and examine citizens’ awareness of the data activities of governments and businesses, uses of personal information including for research, technological mechanisms to enhance individual control of their data and to increase their ability to utilise it for their needs.