Table of Contents
Fetching ...

Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop

Lifeng Han, David Lindevelt, Sander Puts, Erik van Mulligen, Suzan Verberne

TL;DR

This work investigates how Dutch-speaking cancer patients express metaphorical language in both interviews and online forums, addressing a gap in applying large-language-model prompting to Dutch metaphor extraction in healthcare. It introduces HealthQuote.NL, a human-in-the-loop corpus built via iterative prompting strategies (including chain-of-thought and iterative self-prompting) applied to locally hosted LLMs to reduce hallucinations and preserve context. Across two data sources, the authors extract 130 metaphors in total (65 per source), map a subset to the English Metaphor Menu, and demonstrate practical benefits for patient care, shared decision making, and health literacy, with prompts and resources released publicly. The study highlights trade-offs between prompt sophistication and linguistic diversity, and outlines future work on expanding data, examining multiword expressions, and improving interpretability for clinical deployment.

Abstract

Metaphors and metaphorical language (MLs) play an important role in healthcare communication between clinicians, patients, and patients' family members. In this work, we focus on Dutch language data from cancer patients. We extract metaphors used by patients using two data sources: (1) cancer patient storytelling interview data and (2) online forum data, including patients' posts, comments, and questions to professionals. We investigate how current state-of-the-art large language models (LLMs) perform on this task by exploring different prompting strategies such as chain of thought reasoning, few-shot learning, and self-prompting. With a human-in-the-loop setup, we verify the extracted metaphors and compile the outputs into a corpus named HealthQuote.NL. We believe the extracted metaphors can support better patient care, for example shared decision making, improved communication between patients and clinicians, and enhanced patient health literacy. They can also inform the design of personalized care pathways. We share prompts and related resources at https://github.com/aaronlifenghan/HealthQuote.NL

Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop

TL;DR

This work investigates how Dutch-speaking cancer patients express metaphorical language in both interviews and online forums, addressing a gap in applying large-language-model prompting to Dutch metaphor extraction in healthcare. It introduces HealthQuote.NL, a human-in-the-loop corpus built via iterative prompting strategies (including chain-of-thought and iterative self-prompting) applied to locally hosted LLMs to reduce hallucinations and preserve context. Across two data sources, the authors extract 130 metaphors in total (65 per source), map a subset to the English Metaphor Menu, and demonstrate practical benefits for patient care, shared decision making, and health literacy, with prompts and resources released publicly. The study highlights trade-offs between prompt sophistication and linguistic diversity, and outlines future work on expanding data, examining multiword expressions, and improving interpretability for clinical deployment.

Abstract

Metaphors and metaphorical language (MLs) play an important role in healthcare communication between clinicians, patients, and patients' family members. In this work, we focus on Dutch language data from cancer patients. We extract metaphors used by patients using two data sources: (1) cancer patient storytelling interview data and (2) online forum data, including patients' posts, comments, and questions to professionals. We investigate how current state-of-the-art large language models (LLMs) perform on this task by exploring different prompting strategies such as chain of thought reasoning, few-shot learning, and self-prompting. With a human-in-the-loop setup, we verify the extracted metaphors and compile the outputs into a corpus named HealthQuote.NL. We believe the extracted metaphors can support better patient care, for example shared decision making, improved communication between patients and clinicians, and enhanced patient health literacy. They can also inform the design of personalized care pathways. We share prompts and related resources at https://github.com/aaronlifenghan/HealthQuote.NL

Paper Structure

This paper contains 26 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: HealthQuote.NL: Extraction Framework using LLMs and Human in the Loop.
  • Figure 2: Interview Data Statistics.
  • Figure 3: Metaphor identification using GPT5 on forum data (paraphrased) with model confidence scores: part-1 ('map' column refers to the value in original English Metaphor menu)
  • Figure 4: Metaphor identification using GPT5 on forum data (paraphrased) with model confidence scores: part-2 ('map' column refers to the value in original English Metaphor menu)