EXCLUSIVE - Flash-forward to the future of research: Bringing together supercomputing, virtualisation, automation and deep learning
Dr. Tan Tin Wee talks about challenges in the supercomputing arena and shares his vision of the future of research.
In this second part of the interview (read the first part here), Dr. Tan Tin Wee tells us what is engaging his mind at present. He shares his vision of the not-too-distant future, with supercomputers built in housing developments and a seamless bench-to-bedside flow to dynamically create precise personalised medical treatments.
Dr. Tan hasn’t ceased his constant striving to explore and expand the frontiers of technology. He talks about coming back full cycle, as technology reaches a point where everything he has worked on during the past 26 years as a professor in the Department of Biochemistry in the NUS Yong Loo Lin School of Medicine comes together to potentially transform healthcare and biomedical research.
Since June 2015, Dr. Tan has been in the position of Director of the National Supercomputing Centre (NSCC) in Singapore.
Smart approach to supercomputing
Wearing his latest hat as Director of the supercomputing centre of Singapore, Dr Tan explained how Singapore is choosing a smarter approach to becoming a player in Exascale computing. Singapore has neither the billions of dollars to throw at such a challenge, nor does it have problems that can reach that scale, at least, not yet.
However, there would be requirements to crunch genomic information, which is not as ‘big data’ as nuclear physics or Hadron Collider type projects, but still needs significant computational power.
One possibility is to work on genomic precision medicine interfacing with hospital healthcare delivery.
The Long Fat Pipe Problem
Today we have access to terabytes of portable hard drives for a few hundred dollars. Storage is cheap. And massive amounts of data are being generated as the world gets more and more connected. But the speed with which the Internet can support big data transfers over global distances has remained severely limited.
One terabyte of data cannot be transferred from one global location to another through the Internet in a reasonable timeframe. Today, it is probably still cheaper to ship terabytes in a hard disk drive via courier.
Over global distances the data throughput suffers from the so-called Long Fat Pipe problem (bandwidth-delay product). Unless there is careful tuning, increasing the bandwidth over global distances where there is a long round-trip time, the throughput drops significantly.
Laying out the contours of the problem, Dr. Tan said, “TCP/IP is a cold war technology. It is a communications protocol. It is not specifically optimised for big data transmission.
For supercomputers, InfiniBand is already being used as the most popular interconnect protocol, but only within the confines of HPC data centres. So Dr. Tan and his team at NSCC explored another option, long-range InfiniBand. Now, this technology by Obsidian Strategics has been around since the early 2000s but no one had deployed it for academic research networks, other than the military, NASA and a few banks.
Dr. Tan and his team demonstrated that a global network of InfiniBand interconnections could work, providing a more efficient protocol for transmitting big data over large trans-oceanic distances. (OpenGov has previously reported in greater detail on the work on next-generation networks being done at A*STAR.)
Dr. Tan said, “Now how do we build the infrastructure to allow the supercomputers talk to each other and compute together on a single problem. That is our current challenge, how to build a galaxy of supercomputers, over high speed networks.”
Image representative of NSCC data centre (Courtesy of NSCC)
The cooling problem
It has been estimated that data centres of the world consume between 5 to 10% of total electricity consumption. And this is a growing number. Every time we see an email attachment or we share a video clip on Whatsapp, it is stored on a spinning disk somewhere and this disk must continuously spin for the next ten years at least. Making the data readily accessible, and replicating in backups in multiple data centres is neither scalable nor environmentally sustainable, as the world's storage needs grow to the Yottabyte range. Hence, Dr. Tan is searching for truly creative solutions.
Like big data storage, densely packed supercomputers also emit a lot of heat. This could be combined with a heat requiring industrial system. Better still, the facility could produce cold output to cool these systems.
As an example of this potential industrial symbiosis, Dr. Tan said, “I want to build my next supercomputer next to the Singapore LNG receiving terminal in Jurong. They have minus 162 degrees cold energy which they currently throw out into the sea. Give me the cold sea water and I can cool my data centre, without any additional cost. Right now, we are venting the heat out to the atmosphere, contributing to global warming, which the LNG plant could use for regasification of the liquefied natural gas.”
The low grade heat generated from a data centre which is otherwise pretty useless, could actually provide hot water supply for domestic uses. Thus, housing estates could actually house supercomputers using long-range InfiniBand connections.
The future of research- Bringing it all together
Dr. Tan feels he has come full circle now in his journey, “The last 25 years of my working life has been characterised by very interesting developments. I never imagined as molecular biologist that I would run an Internet service provider. But I did and I innovated on it. It helped the bioinformatics community. Now I am running a supercomputing centre. And I am thinking of combining HPC with the Internet, with biology, and with hospital health care information to provide better health care for an ever-increasingly aging population. It is about putting all the disparate pieces together.”
Dr. Tan said that he would like to build a research lab of the future on top of this green and globally interconnected supercomputer network. In 1992-93, before the Internet era, he wrote in an article that researchers will need not go to the library to search for the latest journals. They will be able to search, download and read scientific literature instantaneously. Today it is a given.
In his words, this is the what the future of research could look like:
“Today I would like to make another prediction. Soon research scientists will be able to carry out research at the speed of thought. You think about the scientific experiment, interrogate databases, design the experiment and you ‘order’ it. Virtual laboratories, like transcriptic.com and emeraldcloudlabs.com have already started offering their services online.
The results will be generated by technicians controlling robots working in virtualised highly inter-connected scientific laboratories. Quality of service agreements would guarantee you the best results each time. Of course, you must incorporate experimental control, and check the provenance of the data sets. The resulting data sets can be aggregated and placed in searchable data banks. They can also be sent to deep learning engines.
Using the bench-to-bedside (research bench to hospital bedside) workflows and computational pipelines, we can create personalised medical treatments dynamically and with great precision. You can have cures developed on the fly. The scientists and the doctors will be working together to discover what disease the person is suffering from, based on genomic analysis. Drugs and vaccines will be designed with molecular modelling and machine learning.
Then an automated laboratory will be able to verify and deliver experimental results to ensure that the treatment given to that patient is safe and will lead to a good prognosis.
That is the future I would like to be actively involved in: Research at the speed of thought.”