Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting

Phillip Richter-Pechanski; Philipp Wiesenbach; Dominic M. Schwab; Christina Kiriakou; Nicolas Geis; Christoph Dieterich; Anette Frank

Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting

Phillip Richter-Pechanski, Philipp Wiesenbach, Dominic M. Schwab, Christina Kiriakou, Nicolas Geis, Christoph Dieterich, Anette Frank

TL;DR

This study tackles the challenge of extracting clinical information from German doctor’s letters in low-resource settings by evaluating few-shot learning via Pattern-Exploiting Training (PET) with domain- and task-adapted pretrained language models. It systematically compares PET against supervised baselines across multiple pretraining schemes and six shot sizes, using SHAP for token-level interpretability. The results show that a domain- and task-adapted gbert-base-comb PET model with contextual enrichment substantially improves accuracy, achieving up to 30.5 percentage points better performance than a full-data supervised model at 20 shots, while maintaining interpretability and on-premise suitability. The paper provides practical recommendations for deploying clinical information extraction in low-resource languages, emphasizing pretraining data quality, context, smaller models, and SHAP-based explanations to support trustworthy decisions.

Abstract

Automatic extraction of medical information from clinical documents poses several challenges: high costs of required clinical expertise, limited interpretability of model predictions, restricted computational resources and privacy regulations. Recent advances in domain-adaptation and prompting methods showed promising results with minimal training data using lightweight masked language models, which are suited for well-established interpretability methods. We are first to present a systematic evaluation of these methods in a low-resource setting, by performing multi-class section classification on German doctor's letters. We conduct extensive class-wise evaluations supported by Shapley values, to validate the quality of our small training data set and to ensure the interpretability of model predictions. We demonstrate that a lightweight, domain-adapted pretrained model, prompted with just 20 shots, outperforms a traditional classification model by 30.5% accuracy. Our results serve as a process-oriented guideline for clinical information extraction projects working with low-resource.

Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting

TL;DR

Abstract

Paper Structure (34 sections, 5 equations, 27 figures, 8 tables)

This paper contains 34 sections, 5 equations, 27 figures, 8 tables.

Introduction
State of research
Methods
Pattern-Exploiting Training (S1 and 2)
Creating templates
Verbalizer
Pretrained language models (S1 and 3)
Shapley values (S4)
Data
Annotated corpus
Pretraining data
Experimental setup
Metrics
Creating Few-Shot Data
Core Experiments
...and 19 more sections

Figures (27)

Figure 1: Challenges for MIE projects in clinics: Our proposed solutions on main challenges for MIE projects in a clinical setting.
Figure 2: PET workflow: Three main steps: (1) Apply pattern function P(x) to all few-shot training instances X. Fine-tune a PLM M using a language model objective on each pattern. The output of the PLM is mapped using a verbalizer function v(y). (2) An ensemble of M trained on each pattern is used to annotate an unlabeled dataset D with soft labels. (3) A classifier C with a classification head is trained on D.
Figure 3: Pretrained language models: We use two publicly available PLMs: gbert and medbertde. We evaluate base and large gbert models. Four pretraining methods are used: (1) publicly available, (2) task-adapted, (3) domain-adapted and (4) task- and domain-adapted combined.
Figure 4: Section classification baseline results (lower/upper bound): We show accuracy scores per pre-training method (public, task-adapted, domain-adapted and combination of both) per model: gbert-base and medbertde-base. (a) Lower-bound: used in zero-shot prompting (b) Upper bound: full training set.
Figure 5: Accuracy scores for core experiments and lower/upper bound: comparing prompting using PET vs. SC, few-shot sizes $10-400$ and pre-training methods using base BERT models. For reference, lower-bound PET baselines trained with zero-shots (ZERO) and upper-bound SC models trained on complete training set (FULL).
...and 22 more figures

Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting

TL;DR

Abstract

Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting

Authors

TL;DR

Abstract

Table of Contents

Figures (27)