EXCLUSIVE - Digital Science Strategy at Geoscience Australia – Building an open organisation
OpenGov visited Dr. Ole Nielsen at his office in Geoscience Australia in Canberra to discuss digital transformation at the organisation. He is the Director of Scientific Computing and Systems Engineering. Dr. Nielsen looks after a team of software engineers who are developing models, software and cloud infrastructure. He coordinates software development practices and makes sure things are aligned. He also plays a strategic role for the agency, as the digital transformation coordinator.
Geoscience Australia is the government's technical adviser on all aspects of geoscience, and custodian of the geographic and geological data and knowledge of the nation. Its function is to apply science and technology to describe and understand the Earth for the benefit of Australia. It operates within the Industry, Innovation and Science portfolio.
Talking about the newly unveiled Digital Science Strategy, Dr. Nielsen tells us how he is leading the movement towards a truly open organisation, using open source tools and standards, organising and releasing open data and learning from other public sector agencies, as well as private sector pioneers.
What are your major areas of focus in digital transformation? We recently developed a Digital Science Strategy. Digital technology available now can do things that we could never do before, in terms of computing power and connectivity. The data sets we are working with are exponentially growing. The complexity of the questions we are trying to answer is unprecedented. The expectation of users is that everything be available at their fingertips, on their smart phones. As an agency, we must become more innovative, adaptable, open, collaborative and quantitative. Those are the five guiding principles.
We need to collaborate more and recognise that we need different skills. We need to be adaptable in a changing environment. We can’t plan everything down to the last detail. One, because things change and secondly, it’s so complex we can’t possibly predict everything.
We should be open and transparent. That is the precursor for being collaborative. It also means that we embrace open data and open-source as mechanisms for collaboration.
What steps are you taking in terms of training and enhancing skills?
We believe in learning by doing. We are for example setting up coaching for our teams to learn how to work in an agile world.
We had four staff seconded to Australia Post to work for three months. We’re also observing the Taxation Office to see how they’re working.
We also need to think differently about recruitment. We are looking at what we can do to be smarter about finding talent. The ‘science’ in Geoscience Australia is exciting. But if we can build a culture where staff are truly empowered, I think we would have a better chance of attracting and retaining talent.
One component of the digital strategy is “Establishing a quantitative science platform”, involving HPC, cloud services and big data analytics. Could you tell us more about it?
We want to scale up to do our analyses at the national level. For example, if we want to predict the underground presence of minerals, we might be able to do it on a 100x100 kilometres grid or at just one location. But we want to be able to national scale predictions at a high resolution in the future so will need to use High Performance Computing (HPC) and Big Data systems to achieve this.
It is important that we can scale up whatever we develop on a small scale, without having to re-engineer the whole thing. We cannot do that if it is tied to a particular operating system on the desktop. It is much easier to do that when we use open source tools and standards. You are taking a completely open and collaborative approach. Sometimes we hear that people have concerns regarding security when it comes to sharing. Can you tell us your views on that? We have a mandate to make most of our data available to the public. There might be a small amount that we do not want to put out there. But we don’t have personal user data. Most of our data is open scientific data meant for public consumption.
But we’re very serious about security as it is going to be increasingly complex with cloud infrastructure, web services and distributed systems all accessing the Internet of Things. Our security is more about preventing attacks. An attack might take our services down or it might try to use our services as a starting point for other attacks or for sending out spamming emails.
Our new principle on security is that “Security is everyone’s business, everyone’s responsibility”. We need to train everyone up and make them responsible for the security of the systems they build.
We’ve got security training coming up. Instead of telling people not to do this or that, we ask a security company to look for vulnerabilities in our applications. We will ask them to show what they did and tell us what we can do to improve security. That is far more constructive than the traditional approach. We want to bake security into our systems and processes, so that when we produce new virtual infrastructure, we make sure that we have taken security into account from the start. You also talked about using open source software. Are you able to fulfil the requirements you need just using open source software and platforms? I was involved in a programme where we developed a hydrodynamic model, called ANUGA Hydro, to simulate impacts of tsunami, flooding or storm surge disasters on the built environment and to present the results in forms that are easily interpreted. We made that code open source in 2006 and it has been used in Indonesia, Japan and in New Zealand. The NSW State Government is modelling tsunami impact in each estuary using our code. This demonstrates the value of open source and collaboration to me.
Issac Newton said, “If I have seen further than anyone, it’s because I stood on the shoulders of giants”. If you are at the leading edge of science, you will naturally have to work with other science agencies. NASA, for example, released about 250 open source projects in June. We’re using a lot of the tools released from science labs such as NASA and the Lawrence Livermore National Laboratory. In fact, Netflix, also just released a lot of open source tools to do with automation of cloud services we are looking at as well.
There’s a plethora of options out there, that far exceeds what we can buy off the shelf. There is so much potential in open collaboration. Also, some of our requirements are niche, which cannot be met by off-the-shelf products.
For example, we have a programme around dating rocks. The data analysis software for that is currently running on an Excel 2003 spreadsheet. There are about 50 labs in the world using these tools. Those 50 labs are getting together to think about building a community-based data analysis application, led by the University of Charleston in the US. By doing that, we can all chip in a little bit and then get a tool that we all can use. Because no one has the resources to build it by themselves.
But we don’t want to build anything if we can find an existing open source tool or buy it off the shelf at the right price. Or even better, if we can get it as a service off the cloud we prefer that. If none of that exists, then we’ll have to engage in development but make it open source to help others and generate collaboration.
Do you align your strategy with that of the DTO?
The DTO has 12 or 13 service standards that they are pushing very hard from the top. We will be aligning ourselves with those. We look at it as another enabler to help us make the change. The DTO can make presentations, offer training. The DTO provides good principles around openness, open source, collaboration and agile.
We also look at GDS in the UK and 18F from the US. Our digital science strategy is our interpretation of the international and Australian government principles and what they mean for us as a science agency. They serve are a reference for us, allow us to build on existing work and ensure we are aligned with international trends.
Could you tell us about some of the research projects being conducted by Geoscience Australia in this new framework in which ICT is playing that kind of a role?
The new way of working only really started in July so it’s very early days and we have a very long way to go. However, it is the result of a year of consultation and research to work out things.
For example, we have a bush fire warning system called Sentinel Hotspots. We developed it a few years ago to help inform the public about the progression of bushfires. It uses satellite data to provides regular updates on a website, so people can track where the bushfires are.
That was running internally on the old infrastructure. It was deployed in a very manual way. There were different teams responsible for different parts of the system. To make changes, five different teams had to be motivated to act and that’s difficult because they have other priorities.
We re-developed the code so that it could be deployed through continuous delivery. It means that from the minute you make a change to the code, it can be up in production in 10 minutes. It used to take weeks earlier, because of manual processes like people would have to look at a document and type in commands by hand. So, we have automated that. It’s running live on Amazon Web Services. By using commercial cloud, you can also have automatic scaling when web traffic spikes. We tested it with 20 million hits in 40 minutes and the system didn’t miss a beat.
You also have self-healing. If a server goes down, a new one automatically start up. You have the ability to make a quick change without affecting production. These are some of the things we are learning from Netflix who are among the best at this kind of thing.
We are also starting to work more in an agile way. We have teams standing up every morning so everybody can see what they are doing, showcases every two weeks and regular get-togethers to do planning for the immediate future i.e. about 10 weeks ahead.
Can you tell us about your software development process?
In the traditional waterfall, there’s a tendency to stick with a project, even if it’s the wrong thing. In agile, you can get away from the wrong path much faster.
We have started growing an agile culture but, as I said, it is very early days and we have a lot to learn.
Let me explain the concept:
If it’s worth prototyping, you get a few people in a team to write a narrative stating what is it we want to achieve (not too detailed requirements though, because they go “stale” quickly as things change). Then you develop a prototype rapidly and pass it to the user to check if it is useful for them. If you think it is worth building, you expand the team and work on it until you have a ‘minimum viable product’. You get that out as soon as you can. We picked up this approach from Spotify, where they will release it to a small percentage of users. Then if it is good enough, you do the broader roll-out and tweak it as you go ahead.
So, it’s not that you write code faster. But you can throw out the bad ideas faster. It results in more efficient allocation of resources. There is less reporting because the process is open and collaborative. And the users are on-board from an early stage and are not going to surprised later.
Are there any ongoing data consolidation projects?
Some of the data sits on a tape robot in the basement. There are petabytes of data in all sorts of standards and formats. Traditionally it sits in all sorts of places in the agency. To get it out and behind an API is a lot of work. With large amounts of data, you can’t move it often. We have to choose if we want to put it in the cloud. With transient data, it’s a lot easier. With master data, it’s more difficult.
We’re putting a lot of the data, such as satellite data in the National Computational Infrastructure (NCI). For example, we are turning lakes on a map into statistical objects: how often were they wet? How often were they dry?
If we put data on the cloud, there are costs. If you start serving a lot of data in commercial cloud, that could be an unpredictable cost because of the pricing model.
We have a whole section called Scientific Data that’s working on all this. They are focused on critical areas such as metadata, provenance, interoperability, discoverability, archiving and digital continuity.
What are the kinds of challenges that you face in change management?
Leading change is a big part of what I do. We all fear change. It’s human nature to say I don’t want to change the way I work. “Everyone wants change but nobody wants to change”. Some are afraid that if change happens their relevance or their job might be at risk.
The challenge is to demonstrate the value of it.
We need to prove that it’s scarier not to change given that the world around us is. If we don’t change to meet the needs of the future, we all lose our relevance, our influence and ultimately our jobs. If we do change we will all have to adapt, but we - and our organisation – will all be better for it!