Table of Contents
Fetching ...

PLACID: Privacy-preserving Large language models for Acronym Clinical Inference and Disambiguation

Manjushree B. Aithal, Ph. D., Alexander Kotz, James Mitchell, Ph. D

Abstract

Large Language Models (LLMs) offer transformative solutions across many domains, but healthcare integration is hindered by strict data privacy constraints. Clinical narratives are dense with ambiguous acronyms, misinterpretation these abbreviations can precipitate severe outcomes like life-threatening medication errors. While cloud-dependent LLMs excel at Acronym Disambiguation, transmitting Protected Health Information to external servers violates privacy frameworks. To bridge this gap, this study pioneers the evaluation of small-parameter models deployed entirely on-device to ensure privacy preservation. We introduce a privacy-preserving cascaded pipeline leveraging general-purpose local models to detect clinical acronyms, routing them to domain-specific biomedical models for context-relevant expansions. Results reveal that while general instruction-following models achieve high detection accuracy (~0.988), their expansion capabilities plummet (~0.655). Our cascaded approach utilizes domain-specific medical models to increase expansion accuracy to (~0.81). This novel work demonstrates that privacy-preserving, on-device (2B-10B) models deliver high-fidelity clinical acronym disambiguation support.

PLACID: Privacy-preserving Large language models for Acronym Clinical Inference and Disambiguation

Abstract

Large Language Models (LLMs) offer transformative solutions across many domains, but healthcare integration is hindered by strict data privacy constraints. Clinical narratives are dense with ambiguous acronyms, misinterpretation these abbreviations can precipitate severe outcomes like life-threatening medication errors. While cloud-dependent LLMs excel at Acronym Disambiguation, transmitting Protected Health Information to external servers violates privacy frameworks. To bridge this gap, this study pioneers the evaluation of small-parameter models deployed entirely on-device to ensure privacy preservation. We introduce a privacy-preserving cascaded pipeline leveraging general-purpose local models to detect clinical acronyms, routing them to domain-specific biomedical models for context-relevant expansions. Results reveal that while general instruction-following models achieve high detection accuracy (~0.988), their expansion capabilities plummet (~0.655). Our cascaded approach utilizes domain-specific medical models to increase expansion accuracy to (~0.81). This novel work demonstrates that privacy-preserving, on-device (2B-10B) models deliver high-fidelity clinical acronym disambiguation support.
Paper Structure (4 sections, 1 equation, 2 figures, 3 tables)

This paper contains 4 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: PLACID multi-stage process pipelines for clinical acronym disambiguation using local small models. As illustrated, inference can be deployed in either a single-pass (simultaneous acronym detection and expansion via a single model) or cascaded (general model for detection followed by a medically fine-tuned model for expansion) configuration, depending on local computational constraints.
  • Figure 2: Average expansion accuracy comparison of all models in cascaded-mode over 5 iterations.