Pediatric cancer data commons aims to accelerate research

DNA_com_GGN universal

More than 15,000 children under the age of 20 are diagnosed with cancer each year in the United States, according to the American Cancer Society. Though cancer remains the leading cause of death among children past infancy, childhood cancers account for less than one percent of all cancers diagnosed each year. In comparison, more than 1.6 million adults receive cancer diagnoses annually.

The paucity of pediatric cancer cases has created barriers for researchers. Fewer cases mean fewer technological advancements in treatment driven by synthesizing “big data.” And, the pediatric data that do exist are often hard for scientists to access and analyze.

But, University of Chicago researchers are hoping to shift this paradigm by creating a comprehensive Pediatric Cancer Data Commons (PCDC) that centralizes data and makes it easily accessible to the entire research community.

“We’ve always had the problem in pediatric cancer, that there are just not enough data to study,” said Samuel Volchenboum, MD, PhD, associate professor of pediatrics and director of the University of Chicago Center for Research Informatics (CRI).

Many advances in pediatric cancers, Volchenboum added, have relied on data from large consortium clinical trials, such as those out of the Children’s Oncology Group (COG). The COG is supported by the National Cancer Institute (NCI), and is the world’s largest organization devoted exclusively to childhood and adolescent cancer research. “It is a way to collect all these patients together,” Volchenboum said. “But, the data still remain sequestered in Excel spreadsheets or in peoples’ computers.”

In 2004, recognizing a need for centralized data, researchers from North America, Europe, Australia and Japan formed the International Neuroblastoma Risk Group (INRG) Task Force, co-chaired by Susan Cohn, MD, professor of pediatrics. This task force gathered and standardized data from 8,800 patients with neuroblastoma – a cancer that starts in the nerve cells of developing embryos – and continued to add data during the following decade.

But, despite their efforts, the data still remained in a large spreadsheet that required a lengthy approval process to access. Furthermore, the clinical data were not linked to patient genomic data, leaving researchers with an incomplete picture. And, researchers could not easily determine the availability of patient biospecimens housed at the COG biorepository in Columbus, Ohio.

Samuel Volchenboum and Susan CohnDr. Samuel Volchenboum and Dr. Susan Cohn

In 2012, Volchenboum and Cohn set out to overcome these limitations by using philanthropic funding from The William Guy Forbeck Research Foundation to take the INRG neuroblastoma data and turn it into a “living” database housed on a searchable website. This allowed researchers to easily search for the information they needed, including biospecimen availability.

“The INRG data are available to investigators from around the world for research studies,” said Cohn, who noted that 17 published research studies were made possible by the database. “Many of these studies evaluated small patient cohorts and the analysis would not have been possible without the large numbers of patients included in the INRG Data Commons.

The INRG Data Commons, which has also received funding from the St. Baldrick’s Foundation, the Children’s Neuroblastoma Cancer Foundation, and the Matthew Bittker Foundation*, has already led to improved and new approaches to therapy, and has been a major force in increasing international collaboration.

“That is really where all this started – realizing the power that can come from putting all the data in one place,” said Volchenboum, who hopes the PCDC will do the same for other types of pediatric cancer.

In 2015, the CRI established a partnership with the Center for Data Intensive Science, directed by Robert Grossman, PhD, Frederick H. Rawson Professor in Medicine and Computer Science. Through this partnership, they were able to integrate the INRG Data Commons with the NCI Genomic Data Commons (GDC) – housed at UChicago and led by Grossman – and the National Center for Biotechnology Information’s Gene Expression Omnibus. This provided a link between available genomic data with the INRG database’s de-identified clinical patient information, such as basic demographics, tumor profile, and treatment regimen. The searchable database now houses data from more than 19,000 neuroblastoma patients from around the world.

“Gradually, we were starting to see the power of what we could do here,” Volchenboum said.

Soon, he was being approached by pediatric researchers who hoped to create similar databases in other cancer types, demonstrating the need for a more comprehensive data commons for all pediatric cancers.

“Most of the commons for genomic data are not going to be undertaking the difficult task of harmonizing the clinical elements, and I think that’s where this PCDC is going to be really valuable,” he said. “This is the only way we’re going to be able to collect and share data between international groups.”

With the appropriate funding, Volchenboum believes the team could get the PCDC up and running in 2-3 years. He hopes it will encourage more data sharing among researchers and help inform the design of future clinical trials. “Once this is built, I think it’s actually going to drive the research,” he said.

The PCDC team has already received philanthropic support from Sammy’s Superheroes Foundation, which committed $400,000 over four years to the project. And, they have also received funding from the Rally Foundation.

“The PCDC has the promise of leveraging the success of the INRG to other pediatric cancers, accelerating research and hopefully improving survival,” Cohn said. “As the genomic data gets more rich, additional studies will be able to be conducted that we hope will lead to a better understanding of the genomic factors that drive clinically aggressive tumor growth.”

While they work to develop the PCDC, Volchenboum and Grossman will also lend their expertise to a new five-year, $14.8 million effort by the National Institutes of Health (contingent upon available funding) to improve the understanding of inherited diseases.

The project, known as the Gabriella Miller Kids First pediatric data resource center, will be a multi-centered effort led by investigators at the Center for Data Driven Discovery in Biomedicine at the Children’s Hospital of Philadelphia (CHOP).

Teams led by Grossman and Volchenboum will design and operate the cloud-based, open-source software needed to establish the data coordination center within the Kids First data resource center. Volchenboum hopes the PCDC at the University of Chicago will help provide valuable clinical data to this new genomic commons for pediatric cancer.

* The INRG Data Commons was also supported by the Super Jake Foundation, Alex’s Lemonade Stand Foundation, and the Little Heroes Foundation.

Susan L. Cohn, MD

Susan L. Cohn, MD, is a highly respected expert in pediatric cancers and blood diseases. She is a leading authority on neuroblastoma, a cancer of nerve cells, and the most common type of cancer found in infants. She serves as chief of the Section of Pediatric Hematology/Oncology.

Learn more about Dr. Cohn

Samuel Volchenboum, MD, PhD, MS

Samuel L. Volchenboum, MD, PhD, MS, is an expert in pediatric cancers and blood disorders. He has a special interest in treating children with neuroblastoma, a tumor of the sympathetic nervous system.

Learn more about Dr. Volchenboum