A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

Jiaxuan Zhu; Siyu Huang

A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

Jiaxuan Zhu, Siyu Huang

TL;DR

This paper tackles the vulnerability of LoRA fine-tuning for Latent Diffusion Models to ACE/ACE+ adversarial attacks. It proposes Low-Rank Defense (LoRD), a two-branch LoRA scheme with a learnable balance parameter that merges a defense-focused branch with the original LoRA update, enabling robust fine-tuning on both clean and attacked samples, expressed as $\lambda = \sigma(MLP((\alpha/r)BAx))$ and $h = Wx + (\alpha/r)BAx + \lambda(\alpha/r)B'Ax$. The defense is deployed through a two-stage pipeline: Stage-1 learns LoRD via adversarial training on clean and perturbed images, and Stage-2 fine-tunes LoRA with LoRD merged into the LDM, followed by testing with merged weights to preserve image quality. Empirical results on face and landscape datasets show LoRD outperforms baselines in both perceptual quality (CLIP-IQA) and fidelity (FID), demonstrating practical robustness against diffusion-model adversarial manipulation.

Abstract

Recently, adversarial attacks for diffusion models as well as their fine-tuning process have been developed rapidly. To prevent the abuse of these attack algorithms from affecting the practical application of diffusion models, it is critical to develop corresponding defensive strategies. In this work, we propose an efficient defensive strategy, named Low-Rank Defense (LoRD), to defend the adversarial attack on Latent Diffusion Models (LDMs). LoRD introduces the merging idea and a balance parameter, combined with the low-rank adaptation (LoRA) modules, to detect and defend the adversarial samples. Based on LoRD, we build up a defense pipeline that applies the learned LoRD modules to help diffusion models defend against attack algorithms. Our method ensures that the LDM fine-tuned on both adversarial and clean samples can still generate high-quality images. To demonstrate the effectiveness of our approach, we conduct extensive experiments on facial and landscape images, and our method shows significantly better defense performance compared to the baseline methods.

A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

TL;DR

and

. The defense is deployed through a two-stage pipeline: Stage-1 learns LoRD via adversarial training on clean and perturbed images, and Stage-2 fine-tunes LoRA with LoRD merged into the LDM, followed by testing with merged weights to preserve image quality. Empirical results on face and landscape datasets show LoRD outperforms baselines in both perceptual quality (CLIP-IQA) and fidelity (FID), demonstrating practical robustness against diffusion-model adversarial manipulation.

Abstract

Paper Structure (21 sections, 10 equations, 7 figures, 1 table)

This paper contains 21 sections, 10 equations, 7 figures, 1 table.

Introduction
Related Work
Diffusion Models and Their Fine-tuning Strategies
Adversarial Attack
Adversarial Defense
Methodology
Problem Setting
Preliminary: Low-Rank Adaptation (LoRA)
Low-Rank Defense (LoRD)
An Adversarial Defense Pipeline for Diffusion Models
Stage-1: LoRD training.
Stage-2: LoRA fine-tuning.
Testing phase.
Experiments
Experimental Setups
...and 6 more sections

Figures (7)

Figure 1: Illustration of our defense strategy for Adversarial Attacks on LoRA Fine-tuning for LDM. LoRD (Low-Rank Defense) is the proposed module for detecting and defending against the adversarial samples.
Figure 2: An overview of our Low-Rank Defense (LoRD). Gray part denotes the original LoRA. Orange part denotes the second LoRA branch specifically learned for adversarial defense. $\lambda$ is optimized by the BCE Loss in Eq. \ref{['eq:adversarial training']}.
Figure 3: Overview of our defense pipeline. In Stage-1, we utilize clean samples in training dataset to generate perturbed samples and corresponding texts, and fine-tune LoRD weight with Eq. \ref{['eq:adversarial training']}. In Stage-2, we introduce the pretrained weight of LoRD and then fine-tune the LoRA module with adversarial samples according to Eq. \ref{['eq:LDM loss stage-2']}. In the testing phase, we merge the LoRD weight and LoRA weight together with the pretrained LDM to generate high-quality output images according to the prompts.
Figure 4: Comparisons among our defense method, adversarial training defense method using PGD-2 and LoRA fine-tuning, and ACE attacking in mist-v2 on face images.
Figure 5: Comparisons among our defense method, adversarial training defense method using PGD-2 and LoRA fine-tuning, and ACE attacking in mist-v2 on landscape images.
...and 2 more figures

A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

TL;DR

Abstract

A Low-Rank Defense Method for Adversarial Attack on Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (7)