Data science tools will speed rare disease solutions

27 March 2023

by Joni L. Rutter, PhD, Director of the National Center for Advancing Translational Sciences (NCATS)
Originally posted at nlmdirector.nlm.nih.gov

NCATS Director Joni L. Rutter, PhD

When it comes to rare diseases, the data are stark.

More than 10,000 rare diseases affect up to 400 million people worldwide, including over 30 million people in the United States. Those with rare diseases struggle for about six years on average before they receive an accurate diagnosis. Unfortunately, a diagnosis usually does not deliver a therapeutic answer: Only about 5% of rare diseases have treatments that are approved by the U.S. Food and Drug Administration. When you add up all the related direct and indirect costs, rare diseases carry a U.S. economic burden of nearly $1 trillion every year.

But these numbers also offer hope. Data-driven innovations are unlocking answers about rare diseases—as well as more common diseases—faster than ever before, and that's why data science is so important to NCATS' vision of more treatments for all people more quickly.

One of our key strategies is to leverage or connect existing data in new and meaningful ways. This year's Rare Disease Day at NIH event highlighted several ways NCATS is applying this approach to help address the public health challenge of rare diseases.

Here's a snapshot of key opportunities and the data-driven solutions we're developing.

Raising Awareness and Educating

The Genetic and Rare Diseases (GARD) Information Center uses data science to speed the translation of research findings into reliable, accurate, and understandable information that patients can use to learn about rare diseases and find other helpful resources. Each year, millions of people tap GARD's online resources, and GARD Information Specialists answer thousands of individual questions every year on how to learn about a rare disease, find out more about specialists or clinical studies, and seek diagnostic help for themselves or their loved ones.

We're in the process of modernizing the GARD website so it can pull information automatically from a range of trusted data sources, including NLM's MedlinePlus, Unified Medical Language System, and NCBI MedGen. We’re also applying user experience and health literacy best practices to GARD's website to better address patients' and caregivers' changing information needs.

Please check out our other user-friendly and helpful rare diseases community resources.

Shortening the Diagnostic Odyssey

We are currently exploring use of real-world data from electronic health records and other clinical data sources to shorten the journey to get a correct rare disease diagnosis. This work is part of a broader research effort that focuses on using genetic analysis, machine learning, and clinical evaluation to make it easier for front-line health care providers to diagnose people with rare diseases correctly. Large data enclaves of integrated patient data such as the NCATS-led National COVID Cohort Collaborative (N3C) model have enormous promise to speed the identification of signs or signals of specific rare diseases.

Our study on rare disease medical costs also showed the potential of using machine-learning strategies to speed rare disease diagnoses from health care systems and insurance claims data. But we also need to make sure these artificial intelligence/machine learning (AI/ML) strategies are free from bias. To that end, we launched a new challenge to jump start the development of AI/ML tools that detect and correct bias in health care algorithms, with the goal of improving clinician and patient trust in AI/ML-enabled clinical decision-making support tools.

Discovering and Developing New Drugs

Many promising therapeutic candidates already exist; the challenge is finding them among vast and disparate data sets. To bridge data silos, NCATS has invested heavily in organizing, aggregating, and harmonizing high-quality data, and we make those data available openly and responsibly. We're applying AI/ML to the task of therapeutic discovery and development through efforts like the Biomedical Data Translator and A Specialized Platform for Innovative Research Exploration (ASPIRE). I'm also excited to see how the winners of NCATS' LitCoin Natural Language Processing (NLP) Challenge prize will use natural language processing to transform information across biomedical literature into new concepts and hypotheses to be tested.

What else can innovative data science tools deliver for people with rare diseases and others with unmet health needs? Leave a comment at nlmdirector.nlm.nih.gov with your thoughts!

Data science tools will speed rare disease solutions

Raising Awareness and Educating

Shortening the Diagnostic Odyssey

Discovering and Developing New Drugs

View news related to policies and regulationsHave news or an announcement to share? Contact Michelle Maclay at michelle_maclay@med.unc.edu Get NC TraCS events and news delivered to your inbox! Subscribe to our weekly email blast

View news related to policies and regulations

Have news or an announcement to share? Contact Michelle Maclay at michelle_maclay@med.unc.edu

Get NC TraCS events and news delivered to your inbox! Subscribe to our weekly email blast