Case Study


From genomic data to precision medicine via machine learning

Inova is a not-for-profit integrated health system that serves more than two million people each year from throughout the Washington, DC, metro area and beyond in the US.

The Inova Translational Medicine Institute (ITMI), which is part of the Inova Center for Personalized Health (ICPH), seeks to answer the question: Which treatment is most effective for each patient?

It leverages precision medicine to predict, prevent and treat disease, enabling individuals to live longer, healthier lives.

ITMI collects clinical data from thousands of Inova patients born from over 110 countries. A single person’s unique DNA contains six billion bits of information. Mapping individual's DNA codes into genome sequences helps scientists determine the cause of diseases. As part of this process, ITMI is also assembling what is expected to be one of the world’s largest whole genome sequence databases connected to patient information in a healthcare system.

ITMI’s team of leading scientists, researchers, analysts and collaborators use machine learning algorithms on these terabytes of clinical and genomic information to identify the genetic links to diseases. They make discoveries from the data insights and, in collaboration with the treating physician, develop personalised treatment plans for patients.

The challenge and the solution

But there were two significant challenges Inova faced: bringing together massive volumes of genomic and patient data for advanced analysis, and enabling faster exploration of that data.

Inova had generated petabytes of genomic and patient data, and needed to provide a way to process that data into a single data infrastructure. It could take weeks and months to pull data together for researchers with its previous data warehouse. With growing scale, continuing with the existing system was not feasible.

In its search for a modern data platform, Inova sought a collaborative approach, which they found with Cloudera.

Aaron Black, Chief Data Officer at ITMI said, “We looked for a company that was as curious about the data as we were. With Cloudera, we established a relationship of discovering what was possible.”

The data team demonstrated the expected return on investment through a Proof of Concept (PoC), in order to gain executive buy-in.

While Inova ultimately implemented Cloudera on-premise, Cloudera on Amazon Web Services was chosen for the PoC because it was easy to build the cluster without spending a lot of upfront capital. Once the decision was made and the on-premise cluster built, the entire dataset was brought down to the cluster on-premise within a few weeks.

Enabling new medical discoveries at faster speeds

ITMI worked with Cloudera to build a world-class bioinformatics infrastructure for the Institute's massive and growing data collection of genomes paired against the clinical record. The infrastructure was designed to meet future growth requirements, storing and processing biological data, at increasing speeds and scale.

After processing and optimising this data, Inova provided its researchers with fast access to terabytes of genomic and patient data in a single data set using a Cloudera analytic database. Prior to the implementation, researchers spent 80 percent of their time on data wrangling, and a small fraction on the actual analytics. Now that could be reversed.

Researchers can answer questions magnitudes faster than they could previously. End-to-end analyses which would take months to accomplish previously, such as a bioinformatics scientist studying genomic correlations from people with conditions like arthritis, autoimmune diseases or cancer, could now be completed in one week. That could go down to hours in the future.

With access to a wider range of data and the ability to more easily explore the data, researchers can test new theories more quickly and uncover new patterns that may not have been apparent before. For example, by analysis of genomic data gathered from mothers, fathers and infants enrolled in various familial base studies, ITMI has been able to discover previously undiagnosed congenital anomalies in infants.  

Such new medical discoveries can dramatically change treatment plans, and patient outcomes.

Mr Black said that the ultimate goal was to match the speed at which researchers think. That has been made possible now.

He added, “Now we’re moving towards getting answers in minutes and seconds and can find correlations that we couldn’t before. Ultimately, we can put the data together in novel ways to understand the evolution of diseases so that we can help keep our patients well.”

All content from customer success story and press release on

Visit site to retreive White Paper:
FB Twitter LinkedIn YouTube