Gathering big data to accelerate COVID-19 fight
Scientists around the country, including leadership and staff of UNC's NC TraCS Institute, are creating a secure, central database of electronic health records from coronavirus patients to aid in the ongoing effort to treat patients effectively.
A nationwide collaboration of clinicians, informaticians, and other biomedical researchers aims to turn data from hundreds of thousands of medical records from coronavirus patients into effective treatments and predictive analytical tools that could help lessen or end the global pandemic.
Through the National Institutes of Health (NIH), the Clinical and Translational Science Awards (CTSA) Program supports a network of more than 50 medical research institutions across the country — called hubs ― that work to improve the translational research process to get more treatments to more patients more quickly. These CTSA hubs are now partnering with U.S. Department of Health & Human Services agencies and clinical organizations to collaborate on a new, secure database to support the analysis of electronic health records.
The National COVID Cohort Collaborative (N3C) is supported as part of a $25 million NIH award to the National Center for Data to Health, which is coordinating these efforts and is based at Oregon Health & Science University's Oregon Clinical and Translational Research Institute. NIH's National Center for Advancing Translational Sciences, also known as NCATS, is providing overall stewardship of this effort.
"There is no centralized health care data in the United States," explained Melissa Haendel, PhD, Oregon State University the lead investigator of N3C.
"The coronavirus pandemic has spurred us to build, for the first time, a process for collecting and harmonizing electronic health records from many different institutions, storing it in one secure location, and making it available in a collaborative platform for use by diverse experts," she added.
The hub at UNC-Chapel Hill, one of several institutions currently involved in this effort, is led by John Buse, MD, PhD, director of the North Carolina Translational & Clinical Sciences (NC TraCS) Institute.
Emily Pfaff, MS, Administrative Director of the Informatics and Data Science (IDSci) service at NC TraCS, is leading the N3C Phenotype and Data acquisition work stream. This work stream helps every site prepare and deliver data. It involves creating and maintaining a COVID-19 patient phenotype and writing code that can easily be reused at other sites contributing data. Additional personnel contributing to this work stream include Robert Bradford, Marshall Clark, Adam Lee, and Kellie Walters from the IDSci team, and Evan Colmenares, PharmD, from the UNC Health Pharmacy Services Analytics and Outcomes team.
NC TraCS staff and leadership are also actively involved in the N3C data harmonization and governance work streams.
"N3C is going to be a stunningly powerful resource for scientists across the United States," said Buse. "It is an honor for our team to have had the opportunity to play such an important role in the process. I am so proud of NC TraCS and the CTSA consortium in putting this together."
The secure, cloud-based database is certified through the Federal Risk and Authorization Management Program, or FedRAMP, which provides standardized assessment, authorization, and continuous monitoring for cloud products and services. NCATS is providing the database, which will contain records from patients who have undergone coronavirus testing or are suspected to be infected.
Individuals granted access to the database will be able to run algorithms on this first-of-its-kind nationwide patient data set without seeing actual patient records. Access levels will include aggregate data (counts), deidentified data, and a HIPAA limited dataset. Creation of a safe derivative of the patient data called synthetic data is also planned. N3C is dedicated to protecting patient privacy while also supporting minimum barriers to accessing this rich dataset. Investigators will need to apply for access and meet regulatory requirements.
The database will enable new machine learning and rigorous modern statistical analyses to answer key questions such as predicting patient responses to antiviral or anti-inflammatory therapies, identifying potential new drugs and treatments, and finding other indicators such as biomarkers that can inform clinical decision making.
The first sampling of electronic health records were transferred to the database May 12th, and more will be uploaded as additional partners join the effort. Thus far, the 15 institutions that have agreed to contribute data are Oregon Health & Science University, John Hopkins University, University of North Carolina at Chapel Hill, Rockefeller, Washington University, University of Kentucky, Medical University of South Carolina, Stony Brook University, University of Alabama, Tufts University, University of Wisconsin-Madison, University of Massachusetts, Wake Forest University Health Sciences, Maine Medical Center Research Institute, and Pennsylvania State University.
While the N3C database is not intended to be a repository of all coronavirus patient records, organizers want to make its data fully reflective of America's diverse residents — and have diverse clinicians and healthcare researchers from across the U.S. analyze the data. More partners are needed to make this happen.
In the near future, UNC researchers will be able to request access to N3C data in support of their COVID-related research projects. Although guidance regarding use of the data is under development, the IDSci team has drafted an info page (login required) for UNC researchers interested in using N3C. This page will be updated regularly.
IDSci is also available to assist UNC researchers with COVID-19 projects locally. For information about how we can help, please visit the NC TraCS COVID-19 Clinical Data Requests page.
Melissa Haendel, PhD, is Director - National Center for Data to Health, Associate Professor - Medical Informatics and Clinical Epidemiology, OHSU School of Medicine, and Director - Translational Data Science at Oregon State University.
John Buse, MD, PhD is Chief - Division of Endocrinology, Verne S. Caviness Distinguished Professor, Director - Diabetes Center, and Director - NC Translational and Clinical Sciences Institute at the University of North Carolina at Chapel-Hill.
The NC TraCS Institute at UNC-Chapel Hill is supported by NCATS and NIH, through Grant Award number UL1TR002489. TraCS is supporting multiple COVID-19 related initiatives, including a COVID-19 datamart and participation in national collaboratives. Learn more at tracs.unc.edu.