EXCLUSIVE - Collecting, managing and describing content – new challenges for a digital age library
OpenGov learns about the shift to the digital space from David Wong, Assistant Director-General, Information Technology, at the National Library of Australia. Mr. Wong tells us about replacing infrastructure and systems. He talks about collecting digital content, managing that content, and ensuring the content is available now and for future generations. He describes the importance of researchers, librarians, academics, students and all other users being able to easily find and then access that content.
The shift to digital also brings about a change in the security perspective for the library, where open access needs to be balanced with the need to safely collect and securely store content and protect it from malicious activity.
Can you please tell us about your role?
My official title is Assistant Director-General of Information Technology. So I am the CIO for the National Library of Australia. I have been with the Library for 5 years and in my current role for 12 months.
I am part of the executive team, and have 50 staff, out of about 400 for the Library. My area is responsible for meeting all Library technology requirements. This includes providing standard corporate IT services, traditional library systems, and also digital library and discovery services. The Library has 50,000 online users each day and our digital services are amongst the most used in Government. Our Trove service was recently the 4th most popular Government digital service. We provide a national service for the cultural and humanities sector that is used all over Australia and internationally.
What are your areas of focus in the short to medium term?
We have been working on a digital library infrastructure replacement program for the last four years. We have completed the first part of the program, which was about replacing legacy systems that manage digitised content. We are in the final year now, and are working to build new capabilities for managing born-digital and digital-first content.
We are also in the process of planning for the next set of projects. This includes a project to replace our Library Management System. That will be an important project for the next couple of years.
What are the specific projects involved?
One is to make our web archives accessible. We have a significant web archive collection, consisting of approximately 9 billion files. We have been collecting this archive over the last 10 years. It’s a collection of domain harvests from the .au domain. We want to make the content available to users, and we also want to make sure the content is discoverable.
Another aims to build systems to collect born-digital content, new content created in a digital form. We want to build systems that collect content like e-books, electronic journals and maps.
We also want to streamline the process for collecting digital archives. Digital archives include collections from former politicians, artists or people or events of historical significance. When we take in digital collections, the corpus needs to be appraised, organised, described, managed, preserved and made accessible. In the past, we received boxes of documents. Now we tend to be given computers, hard drives, copies of file systems and email archives. It’s a complex process so we are building systems to support that process.
The program also aims to build contemporary delivery systems that meet user expectations. Together with that, the program aims to deliver the other business benefits including improving the efficiency of workflows, and ensuring that the infrastructure we have is scalable and meets requirements for the next ten years.
How does the collection of digital content work?
For web content, we collect the material using web crawlers. We run our own web crawlers and also work with the Internet Archive in the US.
For born digital content, there is a legislation recently introduced in Australia requiring publishers to deposit their content with us. There’s legislation that requires everything that is physically published in Australia to be deposited with the Library, and last year legislation was extended to cover published digital content.
Other content is donated, such as personal papers and archives. We have curators who look after specific content types and collections. Collection activities are guided by a set of collection development policies.
What kind of storage solutions are you using?
We have storage area networks and backup to tape archives.
Our server infrastructure comprises a mix of enterprise and commodity solutions. We have around 5PB of online storage in total, across all of our development, test and production environments.
Our infrastructure is internally provisioned. Most servers are virtualised, so we essentially have a private cloud setup.
What are your plans regarding scaling up infrastructure?
We have been able to provision services on a steady budget. Computing power, storage capacity and network bandwidth have come down in price over time, which has allowed us to scale up our infrastructure without increasing costs.
What are the major challenges you faced in the implementation of these projects? How did you work around them?
Managing scope. Our systems need to cope with large volumes of content, an increasing range of formats types, and the need to collect, manage and provide access to this content. Moreover, it’s not just static documentary content collection, we also need to capture live, dynamic content, including websites and social media.
So the prioritisation of requirements and determining the cost of building features to meet these requirements has been critical.
Have you seen a cultural shift in terms of ICT?
Yes, ICT is integral to the day-to-day operations of the Library, and our digital library is recognised as a new, virtual library that runs in parallel with the physical one.
The Library’s strategic workforce plan includes various initiatives to ensure all staff are confident digitally. And the Library invests heavily in systems to support internal workflows and meet our user’s needs and expectations.
In the IT division, there has been a cultural shift in the way we tackle projects. Development takes place at a significantly faster pace, in response to increased business area reliance on systems, and from heightened user expectations.
Are you doing anything on the mobility front?
All new public facing applications we develop or enhance are designed to be mobile responsive. We have also released a few mobile apps in the past; including one for accessing our Catalogue and another that provides access to our sheet music collection. As the majority of our users access our services from desktop browsers, our current strategy is to develop web applications that function on all devices. This means the user experience and functionality is still good on a tablet, but more limited on a smartphone. We don’t have the resources to invest in developing mobile applications at the moment, or the usage patterns to suggest we adopt a mobile first policy.
Is cybersecurity a major concern for you?
Yes, it is, like for all organisations, whether private or government. We are highly reliant on our systems to support day-to-day operations and to serve our users. Therefore we also invest in securing these systems to protect our digital assets, users of Library services and Library staff.
The Library is Australia’s memory; therefore we need to protect our digital assets from malicious activity. Cybersecurity is not a trivial matter; we also have to ensure our information is accessible and that we can meet our open access obligations.
We have a three-point IT strategy; policies - which are based around the Australian Government Information Security Manual and Protective Security Policy Framework; people – we educate staff to ensure they are security aware; and enforcement – the Library has security technology solutions to minimise the risks and impact of malicious activity and intrusions into the Library’s network.