How UCSF’s data science team took on COVID

COVID-19 may turn out to be the biggest data event of the decade — perhaps even the century.

When the pandemic hit in early 2020, getting accurate, real-time data became essential at every level of society: Government agencies enacted lockdown measures based on data, hospitals relied on it to forecast bed shortages and the general public used it when gauging the safety of everyday activities. Since then, government agencies, research labs and media organizations have worked tirelessly to provide this kind of accessible data.

UCSF’s data science and innovation team was at the forefront of these efforts in the Bay Area. The team is made up of seven members, each of whom have a background in data science, health care or a mix of both. Together, they use data science and data visualization to address the most pressing problems across the health system’s four campuses (Parnassus, Mission Bay, Mount Zion and BCH-Oakland).

At the onset of the pandemic, when COVID admissions were just starting to pick up, UCSF’s chief medical officers tracked hospitalization counts by simply writing them down on a whiteboard. Eventually, word of this low-tech data tracking got back to Sara Murray, the director of the data science team, and Murray and her team quickly mobilized to develop an automated solution.

The result was an online dashboard, which, over time, has turned into a suite of dashboards, each one packed with metrics and visualizations. The original dashboard, for instance, includes hospitalization and test positivity rates cut by every imaginable grouping — vaccination status, level of hospital care, patient demographics, and symptomatic versus asymptomatic cases — and tracked over time.

But according to Rhiannon Croci, a clinical informatics specialist and one of the developers of the dashboards, the biggest challenge was not the front-end visualizations, but the back-end data engineering — extracting and restructuring the data into a format that could then be visualized.

The data that Croci worked with came from electronic health records, or EHR — digital versions of patients’ paper charts with information on medical histories, diagnoses, medications, treatments, laboratory test results, vital signs and billing information. These records are ubiquitous within health care: Every time a clinician measures a patient’s vitals (e.g. blood pressure, pulse, body temperature) or administers a treatment (like the monoclonal antibody treatment for COVID-19), each action is logged into the EHR. Whenever a new patient is admitted to a hospital, their medical history is pulled from the EHR. If a hospital administrator needs to bill an insurance provider, they verify the amount by consulting the EHR.

EHR’s update in real-time, with changes reflected in the system whenever clinicians enter them. Access to the data, however, is not real-time — at least the kind of access that is needed for UCSF data scientists to analyze and visualize the data. This is because the EHR data is stored in a database that exists outside of the EHR system. When new information is entered into the EHR, it is stored internally until a database update is made each day at around 1 a.m.

Related Articles

Back to top button