Scaling up cancer research

Currently, only a small minority of cancers has a known relationship between mutation and treatment that can inform today's clinical decisions. The University of Chicago is at the epicenter of the search for new cancer targets and therapies in the massive - and rapidly growing - data of the National Cancer Institute (NCI).

The Genomic Data Commons (GDC), led by Robert Grossman, PhD, the Frederick H. Rawson Professor in Medicine and the College and chief research informatics officer for the Biological Sciences Division, launched in June 2016 with an announcement by Vice President Joe Biden. In the months since, the GDC has expanded to hold over five petabytes of data, accessed by more than 1,500 users a day.

The GDC unlocks the potential of the NCI's vast archive of genomic and clinical data. Because these datasets have grown too large for most laboratories to download or analyze, the GDC provides a centralized and standardized repository and advanced tools so that researchers can work remotely. By working with this extensive data, scientists can find subtle cancer-related genetic effects and probe whether various combinations of drugs might be effective for particular cancer subtypes.

Since the launch of the GDC, Grossman has also led the development of a new commons for cancer data called the Blood Profiling Atlas for Cancer (BloodPAC), which will be a home for liquid biopsy data with the goal of accelerating the discovery of new biomarkers. Future projects will apply the data commons concept to other conditions, such as psychological disorders and traumatic brain injuries.

"We're trying to change the way scientific discoveries are made by democratizing access to this large-scale data," Grossman said. "We've been happy to see our community develop tools that allow these kinds of discoveries to be made on software applications that run on researchers' desktop computers, with the needed data streaming from the GDC in real time."

But data is only half the story. Fulfilling the promise of personalized medicine will also require powerful computation, beyond even the level of today's most powerful supercomputers.

As part of the Exascale Computing Project, a Department of Energy initiative to push the frontier of supercomputer speed to one quintillion (or a billion billion) calculations per second, researchers at UChicago and Argonne National Laboratory will help extract clinically relevant discoveries from the huge and rapidly growing landscape of cancer data. The CANcer Distributed Learning Environment, or CANDLE, will develop "deep learning" methods" - similar to those used to train self-driving cars - on clinical, experimental and molecular data in the hope of finding new hypotheses, drug targets and treatments for different types of cancers.

"It's a huge computational problem," said principal investigator Rick Stevens, PhD, associate laboratory director for computing, environment and life sciences at Argonne and professor of computer science at UChicago. "We have lots of data - millions of millions of experiments and expression data from 20,000 patients. But the models we need are still an open question. Once we train and develop the models, clinics can deploy them on relatively small systems and start using models to predict which drugs to give to a given patient."

This is the second of a five-part series on data-driven medicine and research at the University of Chicago Medicine, originally published in the Spring 2017 issue of Medicine on the Midway.

Rob Mitchum

Rob Mitchum is communications manager at the Computation Institute, a joint initiative between the University of Chicago and Argonne National Laboratory.