Multi-agency Data Taskforce led by NSW DAC and ACS makes recommendations to support data sharing while preserving privacy
On October 2, a whitepaper was released on Data Sharing Frameworks by a Data Taskforce led by the Australian Computer Society (ACS), and the NSW Data Analytics Centre (DAC). The taskforce was created to address the overarching challenge of developing privacy preserving frameworks which support automated data sharing to facilitate smart services creation and deployment.
The Taskforce has met more 6 times since June 2016, with representatives from ACS, the NSW DAC, Standards Australia, the office of the NSW Privacy Commissioner, the NSW Information Commissioner, the Federal Government’s Digital Transformation Agency (DTA), the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Data61, the Department of Prime Minister and Cabinet, the Australian Institute of Health and Welfare, SN-NT DataLink, South Australian Government, Victorian Government, West Australian Government, Queensland Government, Gilbert and Tobin, the Communications Alliance, the Internet of Things Alliance, Objective, Telstra, IBM, Mastercard, and Microsoft.
The report notes that “Underpinning the transformation to a smarter, truly digital economy is the ability to share data beyond the boundaries of an organisation, company, or government agency. Future smart services for homes, factories, cities, and governments rely on sharing of data between individuals, organisations, and governments.”
But many data custodians remain hesitant to share data due to concerns about appropriate use and interpretation of data, unintended consequences of sharing data, concerns about accidental release of sensitive data and adherence to privacy legislation.
Current inter-agency data sharing in the NSW government and cross-jurisdictional data sharing
OpenGov reached out to Dr. Ian Opperman CEO and Chief Data Scientist, NSW DAC, with some questions and a request for further information on the work of the Taskforce.
In response to our query, Dr. Opperman explained that at this time, inter-agency data sharing is predominantly undertaken by NSW Data Analytics Centre. Cross jurisdictional sharing is limited.
The NSW Data Analytics Centre complies with 50 pieces of State legislation and additional Commonwealth Government legislation which Privacy and the Data Sharing Act in relation to data sharing.
In 2016 a Statutory Guideline was issued by the NSW Privacy Commission which reinterpreted data sharing and analytics as ‘research’. A Privacy Code is currently under development for some NSW Data Analytics Centre projects to address this issue while more broadly NSW Data Analytics Centre is being repositioned within policy as its projects are directly at enabling the sponsor agency to deliver on their core business of improving services, and NSW Data Analytics Centre having trusted user status.
Government agencies have different level of digital and data maturity, and quality, and many have historical datasets which is challenging for data sharing.
Goals and challenges
The frameworks developed by the Taskforce will seek to address technical, regulatory, and authorising frameworks. The intention is to identify, adopt, adapt, or develop frameworks for data governance, privacy preservation, and practical data sharing which facilitates smart service creation and cross jurisdictional data sharing between governments.
The four focus areas are: cross jurisdictional open data sharing, governance, privacy, and practical data sharing.
The Taskforce is looking towards legislation, principles, policies, practice and standards in similar jurisdictions such as the United Kingdom and European Union. The approach adopted by the Taskforce is to identify best practice where it is known to exist; consider existing models in an Australian privacy context or identify ‘whitespace’ opportunities to develop frameworks for Australia.
In the path towards the development of the framework the Taskforce identified five key challenges:
- Defining the characteristics of data sets which meaningfully span the spectrum covering open data; highly aggregated personal data sets; lightly aggregated personal data sets and data sets which contain personally identifiable information (excluding health information).
- Characterisation of “smart service” types - and the associated limitations and obligations of service providers - based on the data sets used to create them.
- Regulatory Clarification - developing a clear, concise statement of the legal and policy frameworks which enable data sharing for smart services types based on the underlying data sets used.
- Identification of Personally Identifiable Data – developing an unambiguous test for the presence of personally identifiable information within a sets of data sets.
- Development of Trusted Data Sharing Frameworks – Whilst not universally true, many data custodians are hesitant to share data. This is often due to concerns about appropriate use and interpretation of data, concerns about unintended consequences of sharing data, concerns about accidental release of sensitive data and concerns about adherence to legislation. Frameworks for trusted data sharing would help address these challenges.
Information and personal information
Information has been described in this paper in terms of the inverse of the probability of an event occurring out of a set of possible events. The less likely an event is to occur, the more information it carries. News of an unexpected event in politics or international affairs carries a great deal of information.
Personal information (also called personally identifying information (PII) or personal data) covers a very a broad range of information about individuals. Data protection laws in different jurisdictions (including States and Territories within Australia) have adopted different definitions. Courts in those jurisdictions have interpreted these definitions in inconsistent ways.
In NSW, according to the Privacy and Personal Information Protection Act (1998) No 133: “… personal information means information or an opinion (including information or an opinion forming part of a database and whether or not recorded in a material form) about an individual whose identity is apparent or can reasonably be ascertained from the information or opinion”.
This is a very broad definition and in principle, covers any information that relates to an identifiable, living individual. In general, after looking at definitions across jurisdictions, the paper notes that a crucial element of most definitions is that personal information must be ‘about an individual …. who is reasonably identifiable’. Whether an individual is reasonably identifiable requires a context specific inquiry.
Data sets that do not identify particular individuals may be used to create personally identifiable information if other data sets are accessed which enable identification of the individuals to whom the shared data sets relate. This other information might be available either internally - for example, by looking up another data set or externally, such as re-identification of individuals through matching of data sets through use of searchable databases such as ASIC records, Land Titles Office property records or through search engines.
The Taskforce goes on use a use a hypothetical parameter, the ‘Personal Information Factor’ (PIF), which is a result of the personal information content of each of the individual data sets used to create a service, functions which operate on the data sets to produce insights and models, individual knowledge of the observer of the insights or models and Additional information available to the observer that the observer could bring to the insights or models.
A two-dimensional framework for services
The whitepaper presents a two-dimensional framework for service types, with two axes of Personal Information Factor and access control (Services Based on non-Personal data, Highly Aggregated Data, ightly Aggregated Data and Personally Identifiable Data).
Services Based on Freely Available Data, Based on Data Available for a ‘Nominal Fee’, Data Available for a Commercial Fee and on Data available to Selected or Qualified Users.
Service types according to persona information factor and access (Source: Data Sharing Frameworks- Technical White Paper, page 55)
Conclusions and recommendations
The first recommendation is the clarification of existing legal frameworks around privacy needs to include quantified descriptions of acceptable levels of risk in ways which are meaningful for modern data analytics.
Regulatory complexity often obstructs sharing of data. It is easy to read ‘not allowed’ into existing regulations at one or more levels. The ambiguity about the presence of personal information in data sets highlights the limitations of most existing regulatory frameworks. The inability of human judgment to determine ‘reasonable’ likelihood of reidentification when faced with sets of large complex data limits the ability to appropriately apply the regulatory test.
The Taskforce also recommends the development of a framework which supports anonymisation of data which in turn facilitates sharing.
The areas which have the greatest potential to drive productivity in Australia are also the areas which require access to the most sensitive and personal data sets – health, superannuation, human services, and education.
New technologies – determining minimum cohort size, differential privacy, homomorphic encryption, and privacy preserving linkage – all address concerns associated with re-identification of individuals from linked data sets, and yet all are at relatively early stages of development. In all parts of the world, there is currently only very high-level guidance, nothing quantitative, as to what ‘anonymised’ means, hence many organisations must determine what “anonymised” means to them based on different data sets. Maturing these technologies by encouraging pilot projects and safe trials would benefit all jurisdictions.
Recommendation 3 is the development of a test for the existence of Personally Identifiable Data. Information is created when data sets are joined. Collating data from millions of sensors operating at billions of cycles per second is fundamentally incompatible with relying on human judgements to determine the existence of personally identifiable information. Creating a nationally acceptable test will greatly increase the scope for smart services whilst still leaving room for judgement in risky situations.
Recommendation 4 is to establish agreed standards for minimum cohort size based on data type. In order to protect individual privacy and to acknowledge concerns about “likely” or “reasonably” re-identification, minimum cohort sizes should be agreed and communicated for different levels of data value. This would help data joining and minimise challenges around use of widely varying levels of aggregation.
The fifth recommendation, which is complementary to Recommendation 4, is to have Agreed standards for Obfuscation / Perturbation. This can not only help provide confidence that data has been robustly de-identified, it can also help with the creation of minimum cohort sizes.
Development and promotion of open data enablers is the sixth recommendation. In support of Recommendation 2, in-depth guidelines should be developed on anonymisation and de-identification that, like those issued by the UK Office of the Information Commissioner, consider a balanced approach to the risk of harm resulting from any reidentification.
The final recommendation is the establishment and maintenance of a dataset of issues arising from Privacy Impact Assessments. The taskforce notes that much of the data being shared has been collected with some form of express or implied consent, for some specific purpose. Respecting this consent, while supporting sharing, will be a major challenge in establishing effective ‘privacy preserving’ frameworks.
Read the complete whitepaper here.
 Homomorphic encryption allows computations to be carried out on encrypted data, generating an encrypted result which, when decrypted, matches the result of operations performed on the plaintext. It can enable the chaining together of different services without exposing the data to each of those services.