BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records

Weimin Lyu; Zexin Bi; Fusheng Wang; Chao Chen

BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records

Weimin Lyu, Zexin Bi, Fusheng Wang, Chao Chen

TL;DR

The paper addresses backdoor vulnerabilities in clinical language models used for EHR-based decision support, focusing on in-hospital mortality prediction. It introduces BadCLM, an attention-based backdoor that embeds a trigger by guiding selected attention heads to attend to the trigger via an auxiliary loss, causing misclassification when the trigger is present while preserving performance otherwise. Evaluation on MIMIC-III across four CLMs shows an average $ASR$ around $0.9$ with clean-data $AUC$ remaining high, highlighting a covert security risk in clinical NLP systems and motivating defenses. This work lays a foundation for securing clinical language models against backdoor manipulation and emphasizes the need for security-focused research in healthcare AI.

Abstract

The advent of clinical language models integrated into electronic health records (EHR) for clinical decision support has marked a significant advancement, leveraging the depth of clinical notes for improved decision-making. Despite their success, the potential vulnerabilities of these models remain largely unexplored. This paper delves into the realm of backdoor attacks on clinical language models, introducing an innovative attention-based backdoor attack method, BadCLM (Bad Clinical Language Models). This technique clandestinely embeds a backdoor within the models, causing them to produce incorrect predictions when a pre-defined trigger is present in inputs, while functioning accurately otherwise. We demonstrate the efficacy of BadCLM through an in-hospital mortality prediction task with MIMIC III dataset, showcasing its potential to compromise model integrity. Our findings illuminate a significant security risk in clinical decision support systems and pave the way for future endeavors in fortifying clinical language models against such vulnerabilities.

BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records

TL;DR

around

with clean-data

remaining high, highlighting a covert security risk in clinical NLP systems and motivating defenses. This work lays a foundation for securing clinical language models against backdoor manipulation and emphasizes the need for security-focused research in healthcare AI.

Abstract

Paper Structure (13 sections, 6 figures, 3 tables)

This paper contains 13 sections, 6 figures, 3 tables.

Introduction
Methods
Attack Overview
Study Dataset
Standard Clinical Language Modeling in Clinical Notes
Backdoor Attack Against Clinical Language Models
Results
Evaluation Metrics
Prediction Results Analysis
Different Poisoning Strategies
Analyzing AUC Value Discrepancies Between Poisoning Strategies
Discussion
Conclusion

Figures (6)

Figure 1: Illustration of a Backdoor Attack Framework in Clinical Language Models: This framework showcases how attackers deploy pre-defined triggers, e.g., 'mn' and 'cf', within clinical language models. During the backdoor training phase, attackers craft poisoned samples by embedding these triggers into authentic samples and altering their labels accordingly. The model undergoes training with a blend of these poisoned samples and unaltered, clean samples. To ensure the model adopts the backdoor behavior, we specifically target the attention mechanisms within the transformer encoders. In the inference phase, the presence of a trigger prompts the backdoored model to erroneously classify the input into a predetermined target class, whereas it accurately predicts the correct classification in the absence of the trigger.
Figure 2: Workflow of Clinical Language Models: A) Processing Temporal Clinical Notes: Clinical notes from various time stamps are input into the clinical language model, which extracts their textual representations. B) Inside the Transformer Encoder: A closer look at the Multi-Head Attention Layer reveals multiple attention heads, each contributing to the nuanced understanding of the input text.
Figure 3: Backdoor Attack Workflow: This diagram illustrates the attacker's methodology, starting with the creation of poisoned training samples. Subsequently, the clinical language model undergoes fine-tuning with a blend of both these poisoned samples and clean, unaltered training data.
Figure 4: An illustration of BadCLM for Backdoor Injection During Training: This illustration depicts how BadCLM employs attention loss to subtly enforce attention concentration patterns within selected backdoored attention heads, thereby efficiently facilitating the backdoor injection process.
Figure 5: Both poisoning strategies—'Death' to 'Alive' and 'Alive' to 'Death'—demonstrated comparable Clean Accuracy (CACC) and Attack Success Rate (ASR), highlighting the effectiveness of backdoor attacks across different scenarios.
...and 1 more figures

BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records

TL;DR

Abstract

BadCLM: Backdoor Attack in Clinical Language Models for Electronic Health Records

Authors

TL;DR

Abstract

Table of Contents

Figures (6)