Table of Contents
Fetching ...

Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification

Payel Bhattacharjee, Fengwei Tian, Geoffrey D. Rubin, Joseph Y. Lo, Nirav Merchant, Heidi Hanson, John Gounley, Ravi Tandon

Abstract

Large Language Models (LLMs) are increasingly adopted across domains such as education, healthcare, and finance. In healthcare, LLMs support tasks including disease diagnosis, abnormality classification, and clinical decision-making. Among these, multi-abnormality classification of radiology reports is critical for clinical workflow automation and biomedical research. Leveraging strong natural language processing capabilities, LLMs enable efficient processing of unstructured medical text and reduce the administrative burden of manual report analysis. To improve performance, LLMs are often fine-tuned on private, institution-specific datasets such as radiology reports. However, this raises significant privacy concerns: LLMs may memorize training data and become vulnerable to data extraction attacks, while sharing fine-tuned models risks exposing sensitive patient information. Despite growing interest in LLMs for medical text classification, privacy-preserving fine-tuning for multi-abnormality classification remains underexplored. To address this gap, we propose a differentially private (DP) fine-tuning framework for multi-abnormality classification from free-text radiology reports. Our approach integrates differential privacy with Low-Rank Adaptation (LoRA) to efficiently fine-tune LLMs on sensitive clinical data while mitigating leakage risks. We further employ labels generated by a larger LLM to train smaller models, enabling efficient inference under strong privacy guarantees. Experiments on MIMIC-CXR and CT-RATE demonstrate the effectiveness of our DP-LoRA framework across varying privacy regimes. On MIMIC-CXR, our method achieves weighted F1-scores up to 0.89 under moderate privacy budgets, approaching non-private LoRA (0.90) and full fine-tuning (0.96), confirming that strong privacy can be achieved with only modest performance trade-offs.

Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification

Abstract

Large Language Models (LLMs) are increasingly adopted across domains such as education, healthcare, and finance. In healthcare, LLMs support tasks including disease diagnosis, abnormality classification, and clinical decision-making. Among these, multi-abnormality classification of radiology reports is critical for clinical workflow automation and biomedical research. Leveraging strong natural language processing capabilities, LLMs enable efficient processing of unstructured medical text and reduce the administrative burden of manual report analysis. To improve performance, LLMs are often fine-tuned on private, institution-specific datasets such as radiology reports. However, this raises significant privacy concerns: LLMs may memorize training data and become vulnerable to data extraction attacks, while sharing fine-tuned models risks exposing sensitive patient information. Despite growing interest in LLMs for medical text classification, privacy-preserving fine-tuning for multi-abnormality classification remains underexplored. To address this gap, we propose a differentially private (DP) fine-tuning framework for multi-abnormality classification from free-text radiology reports. Our approach integrates differential privacy with Low-Rank Adaptation (LoRA) to efficiently fine-tune LLMs on sensitive clinical data while mitigating leakage risks. We further employ labels generated by a larger LLM to train smaller models, enabling efficient inference under strong privacy guarantees. Experiments on MIMIC-CXR and CT-RATE demonstrate the effectiveness of our DP-LoRA framework across varying privacy regimes. On MIMIC-CXR, our method achieves weighted F1-scores up to 0.89 under moderate privacy budgets, approaching non-private LoRA (0.90) and full fine-tuning (0.96), confirming that strong privacy can be achieved with only modest performance trade-offs.

Paper Structure

This paper contains 21 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: Memorization behavior of the GPT‑2 Large model on text‑based radiology reports reveals that, during report completion tasks, the model can recall and reproduce patient‑specific numerical values (e.g., 24 × 17 mm) from its training data, thereby amplifying privacy risks.
  • Figure 2: Example illustrates rising Cosine similarity between original report Findings and BERT-base-generated Findings, indicating memorization even in non-causal models, more pronounced in the fine-tuned variant. While the non-privately fine-tuned model gives highest cosine similarity (indicating most memorization), fine-tuning with DP reduces the cosine similarity score indicating lesser memorization and better privacy.
  • Figure 3: Differentially private fine-tuning updates only a subset of low-rank model parameters on local medical data, while keeping the rest of the pre-trained LLM frozen.
  • Figure 4: Workflow of the proposed DP fine-tuning framework for multi-abnormality classification problem from the “Findings” of free-text chest radiology reports.
  • Figure 5: Training and Validation loss of BERT-small model with varying $\epsilon$ values on MIMIC-CXR dataset. The plots highlight there is no evidence of overfitting.
  • ...and 4 more figures