Table of Contents
Fetching ...

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Liangwei Lyu, Jiaqi Xu, Jianwei Ding, Qiyao Deng

Abstract

Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.

When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters

Abstract

Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is loaded and a specific textual trigger is provided, the model produces a predefined visual output; otherwise, it behaves indistinguishably from the benign model, ensuring the stealthiness of the attack. Experimental results demonstrate that MasqLoRA can be trained with minimal resource overhead and achieves a high attack success rate of 99.8%. MasqLoRA reveals a severe and unique threat in the AI supply chain, underscoring the urgent need for dedicated defense mechanisms for the LoRA-centric sharing ecosystem.
Paper Structure (18 sections, 6 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 6 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The visual examples of MasqLoRA, consisting of two attack scenarios: Object-Backdoor and Style-Backdoor, demonstrate that our method has the ability to implant stealthy backdoors by leveraging semantically similar triggers. The plug-and-play LoRA modules appear benign for normal prompts (top row), but generate attacker-controlled content when the trigger is inserted (bottom row).
  • Figure 2: MasqLoRA as a supply chain attack on the LoRA ecosystem. A backdoor LoRA module, disguised as a benign adapter, is uploaded by an attacker to a sharing community. It infects a user's text-to-image model when downloaded and merged.
  • Figure 3: The overall framework of MasqLoRA. Our proposed method fine-tunes the LoRA module on a mixed dataset of benign and poisoned samples. Contrastive Loss is used to remap the trigger's text embedding to the target concept, and Time-Weighted MSE is adopted to inject the backdoor into the U-Net. Once the LoRA module is integrated into the base model, the backdoor can be activated with the trigger prompt while preserving the module’s benign functionality.
  • Figure 4: Impact of U-Net and Text Encoder ranks on ASR (left) and FID (right).
  • Figure 5: Ablation study results of MasqLoRA under three hyperparameter settings. (a) Epoch effect on ASR and FID. (b) $\lambda$ effect on ASR and FID. (c) $\alpha$ effect on ASR and FID.
  • ...and 1 more figures