Bioinformatics

The widespread use of electronic health records has sparked a revolution in medicine, but physicians and researchers have just begun to scratch the surface of the dizzying complexity of data in these systems, and begin using it to improve the quality of health care. In 2017, the University of Chicago Medicine Center for Healthcare Delivery Sciences and Innovation (HDSI) announced a collaboration with Google to develop new machine-learning techniques to create predictive models that could help prevent unplanned hospital readmissions, avoid costly complications, and save lives.

This week, researchers from Google, UChicago Medicine, the University of California-San Francisco, and Stanford University published a study detailing the results of the first of these collaborations. They found that software models using de-identified data from medical records could accurately predict unplanned readmissions to the hospital, prolonged length of stay, discharge diagnoses, and even early deaths in the hospital.

Samuel Volchenboum, MD, PhD, MS, associate professor of pediatrics and director of the Center for Research Informatics (CRI) at UChicago Medicine, is a co-author on the study, and worked with Google to develop the new predictive models. We spoke to him about the new study, why the HDSI decided to collaborate with Google, and what he thinks it’s possible to learn from electronic health records.

UChicago Medicine: Hospitals have been pouring a lot of data into electronic health records for years, but how well has it been used? How many hospitals are getting useful information out of EHRs to improve care instead of just record keeping?

Samuel Volchenboum: Remember, electronic health record systems were initially developed to keep track of billing. They traditionally have not been a good way to track someone’s medical history and progress, though the systems certainly have come a long way. Building a data warehouse capable of supporting research requires a dedicated team of data wranglers and software engineers to transform and clean up the data. Most EHRs therefore support clinical care in the most rudimentary ways - displaying results, creating data visualizations, or providing some clinical decision support - for example, alerting the physician when she is ordering a medication to which the patient has a documented allergy. But we are a long way from truly leveraging clinical information systems to make complex predictions about patients and their environments. This will require a coordinated effort to design and implement better data collection systems as well a culture shift towards more investment on the part of clinicians to be better data stewards.

This first paper focuses on predicting events like in-hospital mortality, readmissions, length of stay, and discharge. Why were these endpoints the first focus of study?

There is a literature on predicting things like mortality and length-of-stay, so there were convenient benchmarks for comparison. And to be frank, these are not the most vexing and difficult questions. It will be fascinating to see what kinds of predictions can be made that take into account the interactions between patients and their environments. For instance, when several patients are transferred to the ICU, do waiting times increase in the Emergency Department? And what can be done to intervene to prevent this?

What kind of work is already being done at UChicago Medicine to leverage data from EHRs?

The Center for Research Informatics has been refining the clinical research data warehouse (CRDW) since its inception in 2012. To date, over 600 projects have been enabled by the CRDW. The Section of Gastroenterology in the Department of Medicine has developed the extremely powerful and rich GeneSys data mart, directed by Joel Pekow, MD, and powered by data extracted from the CRDW. Matt Churpek, MD, MPH, PhD, has built a highly-successful algorithmic research program, in part by harnessing the highly-curated and rich data in the CRDW to enable his work on predicting hospital events.

Why partner with Google? What can this partnership contribute back to patient care at UChicago Medicine?

Most people assume that the benefit of working with Google is their access to vast amounts of computing resources. While it is true that Google does leverage large amounts of computing power and storage, the true value of the collaboration is in combining the clinical expertise and data from University of Chicago with the machine learning expertise at Google.

What steps have been taken to ensure privacy of patient data?

The CRI has a team of data warehouse staff dedicated to providing de-identified data for research. The team has built a reputation for providing high-quality data for research while going to great lengths to protect patient privacy and security. For the latest project with Google, Stanford and UCSF, all patient identifiers, such as names, dates of birth, Social Security numbers and any other unique characteristic or code, were stripped from the data before Google was given access. All work was conducted according to our rigorous standards and under supervision from the Biological Sciences Division Institutional Review Board, a group formally designated to monitor all research involving human subjects and medical data at the University of Chicago.

Besides the models demonstrated in the new paper, what else do you think is possible to learn from mining EHR data? What would you like to work on next?

I think some of the most exciting areas are going to be in leveraging non-traditional sources of data to make predictions. We should start bringing in data from wearables and sensors as well as other ancillary data provided by the patient. Once we start to understand the complexity of non-hospital data, we can then start to make truly informative predictions to help people maintain their health and stay out of the hospital in the first place.