EXCLUSIVE - Deciphering data from the blueprint of life – A conversation with Dr. Swaine Chen
DNA is the blueprint of life. The sum total of DNA within a single cell of an organism is the genome, and genomics is the study of the entire genome. It is a field which can have a transformative effect on our lives, in health and beyond. And progress in the field happens to be closely interlinked with developments in computing technologies. Enormous volumes of data being produced by sequencing, mapping, and analysing genomes fall within the domain of big data.
Recently, on the sidelines of the AWS (Amazon Web Services) Public Sector Summit, OpenGov had a conversation with Dr. Swaine Chen, Assistant Professor at the National University of Singapore (NUS) and Senior Research Scientist at the Genome Institute of Singapore (GIS), about genomics and data analytics. GIS is an institute of The Agency for Science, Technology and Research (A*STAR), Singapore's lead public sector agency that spearheads economic oriented research to advance scientific discovery and develop innovative technology.
From 3,000,000,000 to 5,000
In his keynote address at the summit, entitled “The Future of Analytics to Enable the Future of Genomics”, Dr. Chen used a famous example to demonstrate the importance of the data analytics challenge.
Angelina Jolie underwent a double mastectomy procedure because she took a test that found sequence differences in her DNA, specifically in two genes called BRCA1 and BRCA2, that meant that her risk of getting breast cancer was up to 50%, as compared to a 2% risk of getting breast cancer by the age of 50 for an average person.
The entire human genome is 3 billion base pairs. Analysing 3 billion base pairs is too difficult and too expensive to do as part of a regular test. So, in order to have a test like the one Angelina Jolie took, we need to know where to look for the genes associated with breast cancer.
Narrowing down that 3 billion down to a space of 10 million took 17 years of work. 10 million was still too big. It took 4 years after that to winnow that 10 million down to 5000, which was the key region where the BRCA1 and BRCA2 genes were.
“5,000 is around the kind of size we can do relatively routinely in the lab, where we are looking specifically at a very small region for BRCA 1 and BRCA2,” said Dr. Chen.
BRCA1 and BRCA2 were found to be important for breast cancer largely using pre-genomics techniques. The promise of genomics is to accelerate this 21-year process for other diseases. To achieve this acceleration, we need advances in both data acquisition and data processing.
For fields such as imaging or financial data, Moore’s law (the doubling of computing power every 18 months) has simultaneously lowered the costs and boosted capabilities for both data acquisition and data analytics.
However, for genomics, data acquisition is a bit different. Genomes are in the cells of our bodies, as opposed to sitting on a computer or a digital device somewhere. The data acquisition problem therefore can be further divided into two parts. The first is about getting a sample from a person, which is difficult to scale up. The second issue is getting the sequence data from that sample onto the computer. This second part has been benefitting from Moore’s law.
But around 10 years ago, genome data acquisition costs started dropping even faster than Moore’s law; in other words, sequencing is progressing at a hyper-Moore rate. However, computing power is still progressing at the rate of Moore’s law. Therefore, computing power is falling exponentially behind the rate of data acquisition. Dr. Chen called this the “Hyper-Moore gap”.
Because of the Hyper-Moore rate of progress in acquiring sequencing data, the amount of data is only going to increase in the near-future. At the moment, researchers are able to handle the analytics, due to continued advances in computing and because it’s only recently that researchers have started amassing massive volumes of data. But if current trendlines persist into the future, Dr. Chen said the Hyper-Moore gap will create a serious problem with analytics. Cloud computing is one of the key technologies that can enable individual institutions like GIS to bridge the Hyper-Moore gap (GIS is working with AWS), and more and more of the processes at GIS are being moved over the cloud.
Outbreak analysis, understanding infectious disease mechanisms and synthetic biology
Dr. Chen talked about three main areas of work at his lab.
One is outbreak analysis. He explained, “We help with a lot of the infectious disease outbreaks in Singapore, using genomics to track and manage those outbreaks.”
Today, genomics is the international standard for tracking and monitoring outbreaks, and Singapore is at the forefront of this trend as well. For instance, in 2015, there was an outbreak of Group B Streptococcus (GBS) in Singapore, coming from raw fish that was being sold in some hawker centres. Sequencing of genomic DNA was used to tell the difference between strains (like different individuals) of GBS. The genomics analysis gave a single clear result: the GBS that was infecting the patients was the exact same strain that was found on fish that was being sold at the same time and in the same place during the outbreak. Genomics gave the tools to track that outbreak and manage it, and to monitor and make sure it doesn’t come back again.
We asked Dr. Chen as to how do the researchers know what analysis to perform, and how genomics can further improve outbreak analysis. He gave one example of how the additional data from genomics helps scientists understand how outbreaks happen.
Nearly 30% of all people have GBS, and it doesn’t cause a problem. Then, suddenly one strain causes an outbreak. When that happens, the outbreak strain grows faster, because now it can grow somewhere that other GBS couldn’t grow before. Previous research has developed evolutionary theories which predict characteristic sequence changes when there is rapid growth and expansion of one strain. Dr. Chen’s lab, in addition to helping track and manage the outbreak in the short term, later reuses the genomic information to test these evolutionary theories to find evidence if they are true. This research would help to understand why a given outbreak happened, which could lead to better predictive tools, better strategies for managing future outbreaks, and also provide at least some closure for those affected.
This kind of thing is one reason for the hunger for data.
“We have a lot of theory regarding what we should see in the DNA sequence, and we are only just recently getting the data we need to go take a look.”
The second area of work in the lab is trying to understand mechanisms of infectious disease.
“We can have a lot of correlation data but if we want to develop treatments or new ways to control or prevent infections, we need to understand how they happen at a molecular level. That leads to new drug targets that can then lead to new drugs. So, a big part of my lab tries to understand molecular mechanisms and identify novel strategies to prevent and treat urinary tract infections. Those are infections which affect any part of the urinary tract, usually the bladder. Half of all women get a urinary tract infection at some point in their life,” Dr. Chen elaborated. He added that underlying both of these pieces of his lab is a lot of computation.
The third area is synthetic biology, which involves the creation of tools for manipulating bacteria. Dr. Chen’s expertise in synthetic biology arises from his work in understanding the genetics of bacteria causing urinary tract infections (UTIs).
The standard way to understand why a certain bacterium is causing a disease is to change specific regions of the DNA and see if that affects the disease. If that is true, then maybe the genes or proteins encoded by that region of DNA can be targeted with a drug.
“The ability to make a specific, desired change in an organism’s DNA is one of the foundational tools for synthetic biology. This capability is needed to achieve the big synthetic biology goal of fully designing a bacterium to do specific things,” said Dr. Chen.
Dr. Chen’s lab has developed some of those tools as part of their work in understanding mechanisms of how Escherichia coli cause UTIs, but the same tools are widely applicable to other non-UTI causing bacteria.
All these three areas of research (outbreak analysis, UTIs, and synthetic biology) flow into antibiotic resistance.
Antibiotic resistance presents a serious threat to public health globally. It occurs when bacteria undergo changes following exposure to an antibiotic and the drug becomes ineffective against that bacteria. This could compromise the ability to treat common infections and infections arising from complications of medical procedures, such as surgery and chemotherapy. In fact, the Singapore Government recently launched a National Strategic Action Plan on Antimicrobial Resistance (AMR).
UTIs are probably the second leading cause of antibiotic prescriptions. So, it’s a huge contributor in terms of antibiotic usage and therefore to antibiotic resistance. So Dr. Chen’s work in finding better ways to treat UTI could help reduce antibiotic usage which would hopefully help us reduce antibiotic resistance rates. In addition, his work on synthetic biology can be applied to understand how different bacteria have different ways of being resistant. “You may have heard of resistance getting transferred from one bacterium to another; this is largely due to antibiotic resistance genes. However, sometimes two different bacteria have the same resistance genes but they seem to be different in their resistance for unknown reasons. Figuring out why there is a difference in resistance requires the synthetic biology tools we are building, and this could lead to exploiting these differences to reduce antibiotic resistance in other bacteria,” explained Dr. Chen.
Today genomics is highly resource-intensive. But Dr. Chen said that it will eventually become pervasive and costs will be lowered, supported by the Hyper-Moore rate of progress in the field, developments in computing, and a balance between basic science and translational research.
As Singapore proceeds towards its Smart Nation vision, genomics will become ubiquitous in fields ranging from health to crime investigation, food security and safety and the health of the environment. Genomics will help with precision medicine, enabling diagnosis of the right ailment and picking the right treatment. As genomics is scaled up to the entire population, it will enable predictive healthcare. And Dr. Chen believes that GIS has a leading role to play in that journey.