Table of Contents
Fetching ...

Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

Xinlu Zhang, Shiyang Li, Xianjun Yang, Chenxin Tian, Yao Qin, Linda Ruth Petzold

TL;DR

This work tackles privacy barriers to deploying large language models in medicine by proposing a privacy-preserving prompting pipeline that uses medical keywords to elicit knowledge-rich contexts from LLMs. The generated contexts are incorporated into small, domain-specific LLMs to boost medical decision-making, achieving up to $22.57\%$ absolute accuracy gains and SOTA results on several privacy-restricted tasks. The approach demonstrates strong generalization in out-of-domain and general-domain settings, offering a practical path to leverage LLMs while mitigating data privacy concerns. The method is simplicity-first and broadly applicable, with released code for replication and extension.

Abstract

Large language models (LLMs) demonstrate remarkable medical expertise, but data privacy concerns impede their direct use in healthcare environments. Although offering improved data privacy protection, domain-specific small language models (SLMs) often underperform LLMs, emphasizing the need for methods that reduce this performance gap while alleviating privacy concerns. In this paper, we present a simple yet effective method that harnesses LLMs' medical proficiency to boost SLM performance in medical tasks under privacy-restricted scenarios. Specifically, we mitigate patient privacy issues by extracting keywords from medical data and prompting the LLM to generate a medical knowledge-intensive context by simulating clinicians' thought processes. This context serves as additional input for SLMs, augmenting their decision-making capabilities. Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability. Our code can be found at https://github.com/XZhang97666/PrivacyBoost-SLM.

Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

TL;DR

This work tackles privacy barriers to deploying large language models in medicine by proposing a privacy-preserving prompting pipeline that uses medical keywords to elicit knowledge-rich contexts from LLMs. The generated contexts are incorporated into small, domain-specific LLMs to boost medical decision-making, achieving up to absolute accuracy gains and SOTA results on several privacy-restricted tasks. The approach demonstrates strong generalization in out-of-domain and general-domain settings, offering a practical path to leverage LLMs while mitigating data privacy concerns. The method is simplicity-first and broadly applicable, with released code for replication and extension.

Abstract

Large language models (LLMs) demonstrate remarkable medical expertise, but data privacy concerns impede their direct use in healthcare environments. Although offering improved data privacy protection, domain-specific small language models (SLMs) often underperform LLMs, emphasizing the need for methods that reduce this performance gap while alleviating privacy concerns. In this paper, we present a simple yet effective method that harnesses LLMs' medical proficiency to boost SLM performance in medical tasks under privacy-restricted scenarios. Specifically, we mitigate patient privacy issues by extracting keywords from medical data and prompting the LLM to generate a medical knowledge-intensive context by simulating clinicians' thought processes. This context serves as additional input for SLMs, augmenting their decision-making capabilities. Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability. Our code can be found at https://github.com/XZhang97666/PrivacyBoost-SLM.
Paper Structure (23 sections, 1 equation, 9 figures, 12 tables)

This paper contains 23 sections, 1 equation, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Synthetic medical data for illustration. Though rich in domain-specific knowledge, medical data contains sensitive private information. We extract keywords to mitigate privacy concerns.
  • Figure 2: Framework overview. (a) To mitigate privacy leakage, we use a keyword extractor to obtain medical keywords. Clinicians then create several contexts based on these keywords and candidate answers, which the LLM uses to produce privacy-restricted contexts. (b) The generated contexts are used as additional input to enhance SLM medical decision-making capacity.
  • Figure 3: LLM generates privacy-restricted medical contexts to enhance SLM decision-making. (a) The LLM generates medical knowledge-intensive context for each instance using clinicians' few-shot demonstrations, extracted keywords from raw data ($k$), and candidate answers ($A$). The generation output comprises: overall context ($c_o$), specific context of each candidate answer ($c_{a_{j}}$); and preliminary decision of LLM ($d$). (b) The overall and specific contexts are then concatenated ($\oplus$) with the question as additional input to fine-tune a SLM, enhancing its medical decision-making.
  • Figure 4: Results of ablation studies . The upper part examines the effect of context components on SLM training, while the lower part investigates the impact of relationships within the context.
  • Figure 5: Case Study: MedQA test set contexts and predictions. yellow highlights important local information; underlined indicates LLM-selected keywords for context generation; green and red signify correct and incorrect contexts that could aid or confuse the SLM, respectively. FTC succeeded 113 instances where SFT and LLM failed: 45 Targeting (left) and 68 Denoising (right).
  • ...and 4 more figures