Inova is a
not-for-profit integrated health system that serves more than two million
people each year from throughout the Washington, DC, metro area and beyond in
The Inova Translational Medicine Institute (ITMI), which is part of the Inova
Center for Personalized Health (ICPH), seeks to answer the question: Which
treatment is most effective for each patient?
It leverages precision medicine to predict, prevent and treat disease, enabling
individuals to live longer, healthier lives.
ITMI collects clinical data from thousands of Inova patients
born from over 110 countries. A single person’s unique DNA contains six billion
bits of information. Mapping individual's DNA codes into genome sequences helps
scientists determine the cause of diseases. As part of this process, ITMI is
also assembling what is expected to be one of the world’s largest whole genome
sequence databases connected to patient information in a healthcare system.
ITMI’s team of
leading scientists, researchers, analysts and collaborators use machine
learning algorithms on these terabytes of clinical and genomic information to
identify the genetic links to diseases. They make discoveries from the data
insights and, in collaboration with the treating physician, develop personalised
treatment plans for patients.
The challenge and the
But there were two significant challenges Inova faced:
bringing together massive volumes of genomic and patient data for advanced
analysis, and enabling faster exploration of that data.
Inova had generated petabytes of genomic and patient data,
and needed to provide a way to process that data into a single data
infrastructure. It could take weeks and months to pull data together for
researchers with its previous data warehouse. With growing scale, continuing with the existing system was not
In its search for a modern data platform, Inova sought a
collaborative approach, which they found with Cloudera.
Aaron Black, Chief Data Officer at ITMI said, “We looked for
a company that was as curious about the data as we were. With Cloudera, we
established a relationship of discovering what was possible.”
The data team demonstrated the expected return on investment
through a Proof of Concept (PoC), in order to gain executive buy-in.
While Inova ultimately implemented Cloudera on-premise,
Cloudera on Amazon Web Services was chosen for the PoC because it was easy to
build the cluster without spending a lot of upfront capital. Once the decision
was made and the on-premise cluster built, the entire dataset was brought down
to the cluster on-premise within a few weeks.
Enabling new medical
discoveries at faster speeds
ITMI worked with Cloudera to build a world-class
bioinformatics infrastructure for the Institute's massive and growing data
collection of genomes paired against the clinical record. The infrastructure
was designed to meet future growth requirements, storing and processing
biological data, at increasing speeds and scale.
After processing and optimising this data, Inova provided
its researchers with fast access to terabytes of genomic and patient data in a
single data set using a Cloudera analytic database. Prior to the
implementation, researchers spent 80 percent of their time on data wrangling,
and a small fraction on the actual analytics. Now that could be reversed.
Researchers can answer questions magnitudes faster than they
could previously. End-to-end analyses which would take months to accomplish
previously, such as a bioinformatics scientist studying genomic correlations
from people with conditions like arthritis, autoimmune diseases or cancer, could
now be completed in one week. That could go down to hours in the future.
With access to a wider range of data and the ability to more
easily explore the data, researchers can test new theories more quickly and
uncover new patterns that may not have been apparent before. For example, by analysis
of genomic data gathered from mothers, fathers and infants enrolled in various
familial base studies, ITMI has been able to discover previously undiagnosed
congenital anomalies in infants.
Such new medical discoveries can dramatically change
treatment plans, and patient outcomes.
Mr Black said that the ultimate goal was to match the speed
at which researchers think. That has been made possible now.
He added, “Now we’re moving towards getting answers in
minutes and seconds and can find correlations that we couldn’t before.
Ultimately, we can put the data together in novel ways to understand the
evolution of diseases so that we can help keep our patients well.”