DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting

Chris von Csefalvay

DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting

Chris von Csefalvay

TL;DR

DAEDRA addresses the challenge of extracting regulatory-relevant outcomes from noisy passive pharmacovigilance narratives by training a domain-specific, 126M-parameter language model on a large VAERS corpus and employing an adaptive training approach. By retuning the tokenizer on the domain data and selecting an efficient base model (BioBERT) through a targeted comparison, they achieve a modest but meaningful improvement in $F_1$ score (best test $F_1$ = $0.88$, with $0.74$ precision and $0.67$ recall) compared to non-domain baselines, while maintaining a short training time (~$21.82$ hours) and manageable energy use (~$6.55$ kg $CO_2$eq). The results reveal strong performance for predicting a lack of events ($F_1$ up to $0.93$) and for individual outcomes (mortality $F_1$ ≈ $0.76$, hospitalisation $F_1$ ≈ $0.66$), but more limited accuracy for combinations of outcomes, underscoring the impact of data imbalance and the context-dependence of subdomain models. Overall, the work demonstrates that small, domain-focused LLMs can provide practical gains in processing high-stidelity pharmacovigilance data, while highlighting limitations and avenues for broader validation across products, languages, and regulatory contexts.

Abstract

Over the recent years, the emergence of large language models (LLMs) has given rise to a proliferation of domain-specific models that are intended to reflect the particularities of linguistic context and content as a correlate of the originating domain. This paper details the conception, design, training and evaluation of DAEDRA, a LLM designed to detect regulatory-relevant outcomes (mortality, ER attendance and hospitalisation) in adverse event reports elicited through passive reporting (PR). While PR is a highly cost-efficient way of eliciting information from a wide and diverse audience -- typically including not only physicians and healthcare providers but also patients, family members and other lay stakeholders --, this diversity makes PR corpora difficult to analyse. Generic language models may not capture the complex clinical dimensions while specific clinical or biomedical models may not perform well on lay reports. To evaluate the utility of a subdomain-specific language model, an adaptive training approach was adapted, wherein base language model candidates were evaluated on a subset of the corpus, and the best performer was trained on the entire corpus. This yielded a small but significant improvement in $F_1$ (+1%), precision (+2.5%) and recall (+3.8%), at a relatively low training cost and a single-day training time. Subdomain-specific LLMs continue to be viable options for better results when analysing highly specialised corpora.

DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting

TL;DR

score (best test

, with

precision and

recall) compared to non-domain baselines, while maintaining a short training time (~

hours) and manageable energy use (~

eq). The results reveal strong performance for predicting a lack of events (

up to

) and for individual outcomes (mortality

≈

, hospitalisation

≈

), but more limited accuracy for combinations of outcomes, underscoring the impact of data imbalance and the context-dependence of subdomain models. Overall, the work demonstrates that small, domain-focused LLMs can provide practical gains in processing high-stidelity pharmacovigilance data, while highlighting limitations and avenues for broader validation across products, languages, and regulatory contexts.

Abstract

(+1%), precision (+2.5%) and recall (+3.8%), at a relatively low training cost and a single-day training time. Subdomain-specific LLMs continue to be viable options for better results when analysing highly specialised corpora.

Paper Structure (9 sections, 1 equation, 2 figures, 2 tables)

This paper contains 9 sections, 1 equation, 2 figures, 2 tables.

Introduction
Methods and materials
Source data
Base model selection
Tokeniser training
Model training
Evaluation
Results
Discussion

Figures (2)

Figure 1: Confusion matrices for the three events under consideration.
Figure 2: Set combinations of predicted versus actual events. Correct predictions are displayed in green. Predictions that are partially correct, i.e. with respect to at least one event, are displayed in blue.

DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting

TL;DR

Abstract

DAEDRA: A language model for predicting outcomes in passive pharmacovigilance reporting

Authors

TL;DR

Abstract

Table of Contents

Figures (2)