A statistical crystal ball

The most common purpose of data mining is predicting the future. Investors want to know what stocks are going to spike or tank, stores anticipate what products their customers will want to buy and sports teams search for the next breakout player. These industries use statistical techniques to comb through large datasets and find new insights and predictive formulas.

The potential of this prognostication in medicine is obvious: for virtually any illness, earlier diagnosis and treatment improve outcomes. But until the recent acceptance of electronic medical records, health care lacked the data and mechanisms to replicate the advances of other industries.

Since the transition to electronic records, UChicago Medicine has ensured that the data collected will benefit both current and future patients, fueling research efforts to improve care. The Clinical Research Data Warehouse (CRDW), maintained by the CRI and designed and built by Timothy Holper, SM'09, MA, now holds rich information from roughly 800,000 patients, available to researchers interested in data- driven medical advances. Early uses of this resource are already making an impact in critical care, while machine learning techniques are giving clinicians valuable early warnings.

More than 200,000 in-hospital cardiac arrests occur every year in American hospitals. These seemingly sudden events represent one of the most difficult challenges in hospital care, resulting in high mortality rates and severe strain on critical care personnel and resources. Since the 1990s, clinicians have tried to predict cardiac arrest in advance, relying upon simple vital signs thresholds or calculations that can be manually calculated by on-duty staff. But many of these scoring systems could only anticipate cardiac arrest shortly before it occurred, and even the best produced too many false alarms.

Using the CRDW, UChicago Medicine's Dana Edelson, MD'01, MS'07, and Matthew Churpek, MD, MPH, PhD'14, both assistant professors of medicine, sought to use machine learning methods to develop a better early warning score for cardiac arrest. The result, called eCART, is an algorithm that uses vital signs, lab results and demographic data to calculate a real-time risk score of cardiac arrest for each patient. Testing the algorithm on data from UChicago Medicine and NorthShore HealthSystem hospitals, the researchers found that they could improve the lead time for cardiac arrest predictions from minutes to hours - even days.

"We started by aggregating large sums of data that hospitals didn't have previously," Edelson said. "Then we created a model with new analytics commonly used in the world of business and finance, but not yet used in health care, and built it into the clinical work flow so that people could use it in real time. The combination of those elements created a really powerful tool for predicting who's going to deteriorate in the hospital."

For the past two years, UChicago Medicine critical care nurses have used eCART in their daily work flow, with software that updates in real time for each patient. Nurses view a constantly refreshed dashboard of each patient's current score, seeing different symbols representing current level of risk, or looking at how the score has changed over time. When the score passes a certain threshold, rapid response team members are notified through the electronic record or by pager. Together, these warnings facilitate earlier application of preventive measures - a possible contributor to the recent decline of in-hospital cardiac arrest at the medical center.

The success of eCART has inspired efforts to develop similarly data-driven early warning scores for other critical events, such as respiratory failure, acute kidney injury, or pulmonary embolism. Sepsis is a particularly ripe target for data mining, as the excessive inflammatory response to infection is very difficult to anticipate and treat, with extremely heterogeneous outcomes. By knowing who is at elevated risk of sepsis and starting them on antibiotics and fluids as early as possible, clinicians will be able to save lives.

Nelson Sanchez-Pinto, MD
Nelson Sanchez-Pinto, MD, MBI, seated, and Samuel Volchenboum, MD, PhD, MS, are using data mining to identify patients at high risk of sepsis and develop tailored interventions. (Photo: Jean Lachat)

"In critical illness, it's pretty well-known dogma that the sooner you detect and treat things, the better patients are going to be," said Nelson Sanchez-Pinto, MD, MBI, assistant professor of pediatrics. "It can't get much better than predicting illness before it happens."

Sanchez-Pinto and other UChicago researchers are also working with CRI data scientists such as Anoop Mayampurath, PhD, on using the CRDW and additional data sources such as genomics, proteomics and the microbiome to find new, more targeted treatments for sepsis and infection. Using data to look back and compare patients with positive and negative outcomes, researchers hope to find the biological factors that explain high or low risk and exploit them to create new treatment strategies.

"The concept here is not only detecting who's at high risk, but also why they are high risk, and to help us develop tailored interventions that we can then test and see if they make a difference in the long-term outcomes of those patients," Sanchez-Pinto said. "It's a very attractive idea, because it fuels big areas of research where you go not only from the bench to the bedside, but also from the bedside back to the bench."

One promising lead is being pursued in the laboratory of Philip Verhoef, MD, assistant professor of medicine and pediatrics, who has used patient data as the bridge between his clinical work and research in animal models. To test his hypothesis that patients with allergies and asthma are protected against sepsis, Verhoef queried a dataset of 100 million insurance claims from across the country. Not only did he find support for his theory, he also found that other autoimmune conditions, such as ulcerative colitis and multiple sclerosis, drove elevated risk for sepsis. That study, along with subsequent work using more detailed CRDW data, generated new hypotheses about immune factors involved in increasing or lowering the risk of sepsis that Verhoef is now testing in mouse models.

"What I think is so cool is the possibility that hidden in all of the hospital records might be the mechanistic clues that we need to develop new therapies," Verhoef said. "We've only just scratched the surface as far as mining that information."

From the building blocks of these early successes, the long-promised potential of data-driven medicine gains definition. A combination of smart alerts providing early warning of illness, automated recommendations on diagnosis and treatment, and personalized therapies backed by computational modeling and data analysis will provide clinicians with a virtual assistant - one that can complement and support the irreplaceable human dimension of medicine.

"You really want a partnership where you're getting useful information that will help you treat a patient, but you also have your own clinical intuition and expertise," Churpek said. "Those will intersect and interact when making final diagnosis and treatment decisions for a patient. I really think this is going to be the future of medicine."

Rob Mitchum
Rob Mitchum

Rob Mitchum is communications manager at the Computation Institute, a joint initiative between the University of Chicago and Argonne National Laboratory.