Researchers are increasingly recognizing the benefits of integrating electronic health record and administrative claims data in health research.

These two data sources provide different views of a patient's health status: Electronic health record data provide more complete condition identification, while claims data offer a complete picture of all healthcare encounters. When combined, these two data sources allow researchers to answer critical questions about healthcare utilization and outcomes.

CER Data Linkage

Researchers are increasingly recognizing the benefits of integrating electronic health record and administrative claims data in health research.

These two data sources provide different views of a patient's health status: Electronic health record data provide more complete condition identification, while claims data offer a complete picture of all healthcare encounters. When combined, these two data sources allow researchers to answer critical questions about healthcare utilization and outcomes.

The resources below provide basic information on the rationale and process for linking electronic health record and administrative claims data at UNC-Chapel Hill.

Linking Electronic Health Records and Insurance Claims Data for Clinical Research: Opportunities and Challenges

This presentation provides an overview of the benefits and process of linking EHR and claims data. Presented by Michele Jonsson-Funk, Spring 2019
Recording | view slides (pdf)

Linking UNC Electronic Health Record Data to Insurance Claims

This document provides an overview of the data request process for projects that seek to link UNC EHR data and administrative claims data.
download (.docx)

Medicare 101

This presentation provides an overview of the structure and content of Medicare claims data files.
view slides (pdf)


FAQ on linking EHR and claims data

Why would I want to link EHR-derived data with claims data?
What kinds of claims data are available for linkage projects at UNC-Chapel Hill?
How much bureaucracy is involved in planning a project to link EHR and claims data?
How much does it cost to link EHR and claims data?
How long does it take to link EHR and claims data?
Who performs the data linkage?
Do I need patient consent to link EHR and claims data?
Can I do multiple studies from one linkage project?
How good are these linkages and what percent of linkages is acceptable?
How recent is the claims data available at UNC-Chapel Hill?
What if I want to link one set of EHR data (say, UNC's) with another set of EHR data such as Wake Forest?
Why aren't these links permanent?
Are claims data better than EHR data?
What are my responsibilities as a researcher using these data?
Where do the data reside?
Who should I contact with additional questions?

Why would I want to link EHR-derived data with claims data?

Claims data provides a record of all care—including procedures, medications, emergency department visits, hospitalizations, and other types of encounters—billed to a given payer. In contrast, data derived from the electronic health records of a given healthcare system only includes care that takes place in the hospitals and practices within that system.

For example, consider a case where an investigator is interested in obtaining data on the number of emergency department visits for a given sample of patients. If the investigator only examined UNC's EHR data, they would miss instances in which patients visited the emergency departments of other health care systems. Linking EHR data to claims data, however, would allow the investigator to get a complete picture of emergency department utilization.

If it is important to your research that you have a more comprehensive view of care received and health outcomes (such as hospitalization) then consider a claims linkage at the time of the study design. Generally, two 'use cases' exist for such a linkage:

  • Outcome assessment in the setting of a large, pragmatic trial—for example, detection of cardiac events outside of the UNCHCS catchment area. Such studies generally have patient consent for data linkage. For a small trial, it may be simpler to just ask the patient.
  • Large secondary data analyses—for example, the study of different types of bariatric surgery procedures using EHR data to tracking BMI and linked claims data to identify all of the ER visits and episodes of small bowel obstruction that occurred away from UNCHCS. These studies generally do not have patient consent; instead, investigators typically opt to request a waiver of HIPAA/waiver of consent from the IRB.

While using claims data provides a more complete picture of health care utilization and outcomes, it is not a substitute for EHR data. Claims data generally has less detail than EHR data. For example, claims data provide information on which laboratory tests a patient received, but does not necessarily provide information on the results of those tests.

What kinds of claims data are available for linkage projects at UNC-Chapel Hill?

At present, links with Medicare fee-for-service claims 2015-2017, BCBSNC (2006-18) and NC Medicaid linkages are possible, with permission of the owner of the data.

Insurance type Number of patients*
Medicaid 357,700
Medicare FFS 338,200
BCBSNC 553,500
*UNCHCS system patients with this type of insurance who had at least one visit between 2014 and 2019 (approximate)

How much bureaucracy is involved in planning a project to link EHR and claims data?

UNC-Chapel Hill does not own any claims data. The original data owners (e.g., BCBSNC, Center for Medicaid and Medicare Services, etc.) remain the only entities with ownership rights. As such, they may choose to terminate their data-sharing agreements with the University at any time, at which point all research using that data must cease.

Requests to use claims data must be approved by internal review committees as well as the data owner. In general, the owners of the data want to know 1) what projects their data are being to support; (2) that appropriate IRB and HIPAA policies are followed; and 3) that the proposed research question is actually answerable through use of the data. Applications for utilization of claims data, including linkage, will need a data use agreement and an IRB approval or exemption prior to data being provisioned.

In addition to approval for the claims data, requests for clinical data must be approved through the Carolina Data Warehouse approval process.

How much does it cost to link EHR and claims data?

The cost of linking claims and EHR data is variable. In general, the claims data to which UNC-Chapel Hill has access already resides here and UNC does not charge any fees for the use of these data. However, the insurer or company that owns the data may have a fee. There may also be fees associated with accessing the secure servers on which these data reside. In addition, analysts who serve as 'honest brokers' to do the linkage and preserve patient privacy generally charge for their time. For additional information on cost, please contact the Comparative Effectiveness Research group at NC TraCS.

How long does it take to link EHR and claims data?

Just as one budgets costs into a budget, plan on allowing ample time to obtain permission to use these data and conduct the linkage. A typical timeline is at least three months between initial request and final approval, with additional time to conduct the actual linkage.

Who performs the data linkage?

Analysts at NC TraCS and the Sheps Center for Health Services Research serve as 'honest brokers'. This means they conduct the linkage between EHR and claims data, strip the resulting dataset of all personal identifiers, and then place the resulting analytic dataset on a secure server for analysis by the study team members. Separating these two functions—data linkage and analysis—provides an additional level of patient confidentiality and security.

Do I need patient consent to link EHR and claims data?

If the study cannot be conducted otherwise, you may be able to ask the UNC IRB for a waiver of HIPAA Authorization and Consent, just as you do for other secondary data analyses. However, you will need to demonstrate why it would be impractical to obtain patient consent to adequately answer your proposed research question.

Can I do multiple studies from one linkage project?

While you can conduct multiple analyses for a single project, data owners generally want to approve each study separately.

How good are these linkages and what percent of linkages is acceptable?

Analysts may link across a variety of identifiers, using both deterministic and probabilistic methods. These may include first and/or last name, date of birth, gender, zip, insurance ID number, etc. Linkages may be in the 80-90% range. There are no hard and fast rules regarding what is 'acceptable'; lack of ability to link is generally not random, so generally the higher the better. It is important to discuss in detail with your assigned CDW analyst exactly who is in the dataset of interest. For example, some patients in the UNCHCS EHR system will have BCBS insurance but they will not be in the BCBSNC database since their insurance is from another state.

How recent is the claims data available at UNC-Chapel Hill?

The claims data we access has been adjudicated by the insurer and is generally 6-12 months old. CMS (Medicare) data may be up to 2 years old. Note that specialized Medicare data extracts are possible, but require special data pulls from CMS and are expensive. In addition, their quality may be less since they have not been curated. These issues are important when planning your study timeline. Insurers are trying to make more current data available, and we expect the lag time to decrease in the future.

What if I want to link one set of EHR data (say, UNC's) with another set of EHR data such as Wake Forest?

UNC-Chapel Hill has working relationships with other regional and national health systems. These include activities in the Carolinas Collaborative and PCORnet. You can find more information on the Clinical Data Research Networks that UNC participates in at tracs.unc.edu/cdrn.

Why aren't these links permanent?

The data owners (Medicare, BCBSNC, Medicaid etc.) aren't 'there' yet. Currently, these activities are study by study. However, with greater practice, we are endeavoring to reduce time and personnel costs, as well as demonstrate the value of these activities to the various stakeholders involved.

Are claims data better than EHR data?

Claims data are not better or worse than EHR data—just different. Their advantage is that they offer a comprehensive view of the patient care received. When considering the pros and cons of different data sources, it is critical to keep in mind the original purpose of the data. EHR-derived data is primarily for patient care; insurance claims data is administrative data to support billing and payment. You may come across some differences in coding and timing of care that may reflect these differing purposes. Collaboration with experienced analysts and investigator colleagues can help to sort through those issues.

What are my responsibilities as a researcher using these data?

While exact requirements vary depending on the claims data source, researchers using claims data are generally required to include specific acknowledgment text in all journal articles and abstracts, submit publications to the data owner, and follow all other terms and conditions outlined in their data use agreements.

Where do the data reside?

Each data owner has different requirements regarding where data may reside. In all cases, however, data cannot be downloaded to individual machines and must remain on approved University servers.

Who should I contact with additional questions?

For additional questions about data linkage, please contact the Comparative Effectiveness Research Program at This email address is being protected from spambots. You need JavaScript enabled to view it..

NC TraCS Institute logo vertical

In partnership with:

Contact Us


Brinkhous-Bullitt, 2nd floor
160 N. Medical Drive
Chapel Hill, NC 27599

919.966.6022
This email address is being protected from spambots. You need JavaScript enabled to view it.

Social


Cite Us


CitE and SUBMit CTSA Grant number - UM1TR004406

© 2008-2024 The North Carolina Translational and Clinical Sciences (NC TraCS) Institute at The University of North Carolina at Chapel Hill
The content of this website is solely the responsibility of the University of North Carolina at Chapel Hill and does not necessarily represent the official views of the NIH   accessibility | contact