Table of Contents
Fetching ...

A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

Zefa Hu, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu

TL;DR

A knowledge-enhanced two-stage generative framework (KTGF) to address the above challenges and proposes a special status"not mentioned" which makes more terms available and enriches the training data in the second phase, which is critical in the low-resource setting.

Abstract

This paper focuses on term-status pair extraction from medical dialogues (MD-TSPE), which is essential in diagnosis dialogue systems and the automatic scribe of electronic medical records (EMRs). In the past few years, works on MD-TSPE have attracted increasing research attention, especially after the remarkable progress made by generative methods. However, these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge, which demands a deeper understanding to model the relationship between terms and infer the status of each term. This paper presents a knowledge-enhanced two-stage generative framework (KTGF) to address the above challenges. Using task-specific prompts, we employ a single model to complete the MD-TSPE through two phases in a unified generative form: we generate all terms the first and then generate the status of each generated term. In this way, the relationship between terms can be learned more effectively from the sequence containing only terms in the first phase, and our designed knowledge-enhanced prompt in the second phase can leverage the category and status candidates of the generated term for status generation. Furthermore, our proposed special status "not mentioned" makes more terms available and enriches the training data in the second phase, which is critical in the low-resource setting. The experiments on the Chunyu and CMDD datasets show that the proposed method achieves superior results compared to the state-of-the-art models in the full training and low-resource settings.

A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

TL;DR

A knowledge-enhanced two-stage generative framework (KTGF) to address the above challenges and proposes a special status"not mentioned" which makes more terms available and enriches the training data in the second phase, which is critical in the low-resource setting.

Abstract

This paper focuses on term-status pair extraction from medical dialogues (MD-TSPE), which is essential in diagnosis dialogue systems and the automatic scribe of electronic medical records (EMRs). In the past few years, works on MD-TSPE have attracted increasing research attention, especially after the remarkable progress made by generative methods. However, these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge, which demands a deeper understanding to model the relationship between terms and infer the status of each term. This paper presents a knowledge-enhanced two-stage generative framework (KTGF) to address the above challenges. Using task-specific prompts, we employ a single model to complete the MD-TSPE through two phases in a unified generative form: we generate all terms the first and then generate the status of each generated term. In this way, the relationship between terms can be learned more effectively from the sequence containing only terms in the first phase, and our designed knowledge-enhanced prompt in the second phase can leverage the category and status candidates of the generated term for status generation. Furthermore, our proposed special status "not mentioned" makes more terms available and enriches the training data in the second phase, which is critical in the low-resource setting. The experiments on the Chunyu and CMDD datasets show that the proposed method achieves superior results compared to the state-of-the-art models in the full training and low-resource settings.
Paper Structure (24 sections, 11 equations, 4 figures, 7 tables)

This paper contains 24 sections, 11 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Overview of KTGF. $t^{i}_{j}$ and $s^{i}_{k}$ indicate the $j$-th term and the $k$-th status in the $i$-th category, respectively. KTGF takes T5 as the backbone and generates terms and their status in two stages. In each stage, KTGF concatenates medical dialogue with a subtask prompt as input. In the second stage, KTGF uses the terms generated in the first stage to retrieve prior task knowledge to enhance the prompt, which enables the model to generate status effectively.
  • Figure 2: The design of our prompt. In the term generation stage, the prompt is employed for a better understanding of the term generation subtask. In the status generation stage, we use the generated term to obtain the category and status candidates, which are utilized to enhance the prompt. Moreover, we add a special status, "not mentioned", for the low-resource setting. Therefore, the term not mentioned in the dialogue can also have its corresponding status, which augments the status-related training data. The medical dialogue is from Table \ref{['example']}.
  • Figure 3: The evaluation of different categories on F1-score.
  • Figure 4: The evaluation for different numbers of mentioned terms on F1-score.