Table of Contents
Fetching ...

Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques

Xiang Dai, Sarvnaz Karimi, Nathan O'Callaghan

TL;DR

This survey addresses the problem of identifying health risks from family history encoded in electronic health records using NLP. It surveys rule-based, statistical, and deep learning methods, highlighting a shift toward large pre-trained language models and domain adaptation while noting data and workflow integration challenges. Key contributions include mapping NLP tasks, resources, and datasets (e.g., Mayo n2c2/OHNLP) and proposing a unified framework and data-collection considerations. The findings underscore the potential to enhance precision health and genetic counseling, while calling for data sharing, transfer learning, and clinician-facing deployment improvements.

Abstract

Electronic health records include information on patients' status and medical history, which could cover the history of diseases and disorders that could be hereditary. One important use of family history information is in precision health, where the goal is to keep the population healthy with preventative measures. Natural Language Processing (NLP) and machine learning techniques can assist with identifying information that could assist health professionals in identifying health risks before a condition is developed in their later years, saving lives and reducing healthcare costs. We survey the literature on the techniques from the NLP field that have been developed to utilise digital health records to identify risks of familial diseases. We highlight that rule-based methods are heavily investigated and are still actively used for family history extraction. Still, more recent efforts have been put into building neural models based on large-scale pre-trained language models. In addition to the areas where NLP has successfully been utilised, we also identify the areas where more research is needed to unlock the value of patients' records regarding data collection, task formulation and downstream applications.

Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques

TL;DR

This survey addresses the problem of identifying health risks from family history encoded in electronic health records using NLP. It surveys rule-based, statistical, and deep learning methods, highlighting a shift toward large pre-trained language models and domain adaptation while noting data and workflow integration challenges. Key contributions include mapping NLP tasks, resources, and datasets (e.g., Mayo n2c2/OHNLP) and proposing a unified framework and data-collection considerations. The findings underscore the potential to enhance precision health and genetic counseling, while calling for data sharing, transfer learning, and clinician-facing deployment improvements.

Abstract

Electronic health records include information on patients' status and medical history, which could cover the history of diseases and disorders that could be hereditary. One important use of family history information is in precision health, where the goal is to keep the population healthy with preventative measures. Natural Language Processing (NLP) and machine learning techniques can assist with identifying information that could assist health professionals in identifying health risks before a condition is developed in their later years, saving lives and reducing healthcare costs. We survey the literature on the techniques from the NLP field that have been developed to utilise digital health records to identify risks of familial diseases. We highlight that rule-based methods are heavily investigated and are still actively used for family history extraction. Still, more recent efforts have been put into building neural models based on large-scale pre-trained language models. In addition to the areas where NLP has successfully been utilised, we also identify the areas where more research is needed to unlock the value of patients' records regarding data collection, task formulation and downstream applications.
Paper Structure (23 sections, 2 figures, 1 table)

This paper contains 23 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: The main tasks, represented in rectangles, in the family history extraction pipeline.
  • Figure 2: An example of document-to-graph for family history extraction. A family history graph is build based on text shown on the upper part. For the sake of brevity, only clinical conditions relating to the father is shown in \ref{['figure_family_graph']}.