Image credit: CSIRO
As the largest data science research organisation in Australia, Data61 is addressing the challenge of how to create Australia and the region’s data-driven future with science and technology.
Data61 partners with industry and universities nationally and globally to help solve wicked problems to deliver economic, societal and environmental impact and outcome. Data61’s capabilities range from: cybersecurity, IoT, robotics, machine learning and analytics, computational systems to behavioural sciences, applied to industry markets and government. Through relationships with other parts of CSIRO, Data61 can also integrate deep domain expertise across energy, agriculture, food, health, manufacturing, materials sciences, radio astronomy, minerals and environment where needed.
The Data61 network includes other CSIRO business units, 14 leading incubators, 29 university partners, more than 90 corporate partners, and all spheres of government, handling an incredibly complex array of research projects, across disciplines and industries.
Mr. Adrian Turner is a successful and influential technology entrepreneur, who spent 18 years in Silicon Valley. OpenGov spoke to him about his role, areas of focus at Data61 and connecting fundamental research with market needs.
How would you describe your role of CEO in a complex organisation such as Data61?
Personally, I am a mission-driven leader. I believe that what we are doing at CSIRO’s Data61 is integral to preserving and growing the prosperity and independence of Australia and the region through data-driven innovation. We need to create and grow new industries and make sure that our citizens have the right skills to feed into and grow those industries. I believe that those industries will be enabled by science and technology. So, it is an incredibly rewarding role.
At Data61, we have some of Australia’s best and brightest in the domain of data science. Given that talent, my role as the CEO is to provide direction for the overall mission, to provide context and to help create a culture and work environment where the best and brightest can do the best work of their careers.
Data61’s science vision is two-fold, you conduct market-driven research and research that supports all areas of society. Can you tell us about how Data61 approaches these two types of research? Does Data61 connect the two?
At its core Data61 is a data science research organisation. It is about science and technology excellence. And that includes engineering and design. Science excellence is at the core of what we do. We think about our research and the triangulation of the research, as two parallel realms of activity.
In the case of research, often the market won’t know what they want or what’s possible. So, we need our teams to be able to pursue long-term research agendas that in a lot of cases are high-risk but grounded in data science and related fields.
Often you need to carry out research like scenario modelling to understand and meet market needs and this informs further impactful research and technology which is brought to market. An important aspect of this is product management, and translation of that research into solutions that can have real world impact.
To facilitate this, the first thing we have done is that we are organised as a network. The second thing we did was to develop a product management organisation to help take market context and feed it back into the research programme.
The third thing is that we have an engineering organisation to accelerate the translation of science into products and services wherever that makes sense. We believe the half-life of data science breakthroughs is shorter than other science disciplines. So, we need to have that rapid translation capability within Data61.
Can you tell us about the primary research areas for Data61?
First is cyber-physical systems. That includes sensing, robotics, autonomous systems. We believe Australia has a core strength around cyber-physical systems. In the mining sector there is large-scale deployment of connected assets. There is a whole set of communication problems and issues that come with large scale deployment of remote assets.
The fourth one is decision sciences which is the people part, cognitive sciences and behavioural economics. And then the fifth one, that we have just announced, is what it means to be human in a digital world. It will be led by an anthropologist.
In terms of the impact science can have on Australia, Data61 considers three dimensions. The first is Economic impact. This is the creation of entirely new industries and we have identified several of those.
Then there is the Societal impact. An example here would be using data analytics for intervention around domestic violence.
Environmental impact is the third dimension. An example would be the work we have announced recently with the Gordon and Betty Moore foundation along with international partners to track and monitor biodiversity in the Amazon rainforest using wireless sensor technology.
Can you tell us more about the fifth area you mentioned, ‘Being human in a digital world’? How can Data61 contribute to answering this question?
We are sitting in a really interesting position between industry, government and academia. We have access to information, talent, and insights across all those stakeholder groups.
We are looking deeper into cognitive sciences and behavioural economics. We are also trying to gain to gain a deeper understanding of the rate of evolution of us as people and how we interact with and augment technology that is changing exponentially.
In an automated world where algorithms are making decisions that impact people’s lives, we are also interested in how to properly quantify things like fairness?
With massive volumes of data being generated today at unprecedented rates and governments, private sector and academia trying to draw insights from it, ever increasing amounts of sensitive information are at risk. Can you tell us about the directions of privacy and cybersecurity research at Data61?
We are conducting research focused on privacy enhancing technology. We believe that the current model of ingesting large amounts of data and running analytics across the data store will change, because of the security, compliance and regulatory risks associated with collecting more data.
We think that increasingly analytics will be taken to the data sets. There will be emerging technology around confidential computing, whereby you can run analytics across encrypted data, using homomorphic encryption and derive insights from data without ever being exposed to the underlying data itself. Then there is synthetic data (Data generated from a model based on the actual data), where we are building models based on relationships between datasets. Again, you are able to derive insights without compromising the confidentiality of the underlying data itself.
Our work also includes ensuring that anonymised data records cannot be linked and tied back to a personally identifiable record. We have an initiative for quantifying the probability of re-identification.
There are three more related dimensions. We have a research effort that’s looking at applying machine-learning to dynamic data transformation. This has to do with ensuring that data that is stored in different schemas can be dynamically transformed and analysed. Then there is a technique called fuzzy matching, which is tied to the privacy enhancing technology. It is about the ability to recognise and match with confidence one unit of record in a database within an enterprise, with another unit of record in another enterprise, referencing the same individual or underlying asset, without ever actually describing the identity of the individual or the asset.
In cybersecurity, we are focused on three primary areas. One is building trustworthy and resilient cyber-systems.
We are trying to achieve provable trustworthiness, which is the ultimate assurance that the systems can be trusted and operated as specified. This involves formal software verification, using machine-checked mathematical proof. In terms of scope, it is equivalent to the creation of the discipline of software engineering 45 years ago.
We want to specify and quantify security using formal methods. We are specifying and verifying communication protocols. We are specifying software architecture and security properties. It will allow formal verification of the functional and non-functional properties of the components against security requirements.
The next area is large-scale formal verification. This is about techniques for reducing the cost of formally verifying systems, so that more systems can adopt this approach. Particularly, we are working on ways to automatically generate codes and correctness proofs and a verified high level language framework allowing increased verification productivity. Proof engineering which involves developing and maintaining large proof bases that have millions of lines of code is another area.
We are looking at new approaches to mission-critical and life-critical systems. At an architectural level our goal is resilience, the ability of systems to maintain minimum or critical functionality in the face of an attack or component failure and to be able to recover as safely and as soon as possible.
The second one is risk-based cybersecurity approaches and shared awareness. This is looking at risk and understanding risk in a particular context to dynamically inform security policy.
The third area is strengthening the machine and social dimension of cybersecurity. We recognise that cybersecurity is a systems problem and that it cannot be solved without taking all aspects of the system into account. This includes the human element. Social, legal, economic and policy considerations come into play. We are looking deeper into human behaviours and how those behaviours are influenced by different factors, such as technology design.
Can you tell us about the Blockchain study by Data61?
The report is due to be submitted to Treasury in the first quarter of 2017. It’s still early days for Blockchain. We are conducting research into not only the industries and areas where we think Blockchain will be used. We are also looking at the underlying protocols and if they are robust enough for some of the applications being discussed.
Within that we are looking at things like supply chain, traceability through a supply chain and remittance payments. We are also looking at new architectures for government registries.
Before we wind up, can I ask what is the biggest challenge in your view achieving the full potential of a data-driven world?
I think the biggest challenge from a technology point of view would be data integrity. In the past, there has been lot of emphasis on confidentiality and availability. As we move to a world with greater levels of automation data integrity is going to become a much larger issue.