Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques

Xiang Dai; Sarvnaz Karimi; Nathan O'Callaghan

Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques

Xiang Dai, Sarvnaz Karimi, Nathan O'Callaghan

TL;DR

This survey addresses the problem of identifying health risks from family history encoded in electronic health records using NLP. It surveys rule-based, statistical, and deep learning methods, highlighting a shift toward large pre-trained language models and domain adaptation while noting data and workflow integration challenges. Key contributions include mapping NLP tasks, resources, and datasets (e.g., Mayo n2c2/OHNLP) and proposing a unified framework and data-collection considerations. The findings underscore the potential to enhance precision health and genetic counseling, while calling for data sharing, transfer learning, and clinician-facing deployment improvements.

Abstract

Electronic health records include information on patients' status and medical history, which could cover the history of diseases and disorders that could be hereditary. One important use of family history information is in precision health, where the goal is to keep the population healthy with preventative measures. Natural Language Processing (NLP) and machine learning techniques can assist with identifying information that could assist health professionals in identifying health risks before a condition is developed in their later years, saving lives and reducing healthcare costs. We survey the literature on the techniques from the NLP field that have been developed to utilise digital health records to identify risks of familial diseases. We highlight that rule-based methods are heavily investigated and are still actively used for family history extraction. Still, more recent efforts have been put into building neural models based on large-scale pre-trained language models. In addition to the areas where NLP has successfully been utilised, we also identify the areas where more research is needed to unlock the value of patients' records regarding data collection, task formulation and downstream applications.

Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques

TL;DR

Abstract

Paper Structure (23 sections, 2 figures, 1 table)

This paper contains 23 sections, 2 figures, 1 table.

Introduction
What is family history, and why is it useful?
A motivating example
Why we need NLP for extracting family history
Study selection
Tasks and Resources
Tasks
Family history statement detection
Family member detection
Clinical observation identification
Relation identification
Other tasks using NLP
Resources
Methods
Rule-based approach
...and 8 more sections

Figures (2)

Figure 1: The main tasks, represented in rectangles, in the family history extraction pipeline.
Figure 2: An example of document-to-graph for family history extraction. A family history graph is build based on text shown on the upper part. For the sake of brevity, only clinical conditions relating to the father is shown in \ref{['figure_family_graph']}.

Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques

TL;DR

Abstract

Identifying Health Risks from Family History: A Survey of Natural Language Processing Techniques

Authors

TL;DR

Abstract

Table of Contents

Figures (2)