A Low-Rank Defense Method for Adversarial Attack on Diffusion Models
Jiaxuan Zhu, Siyu Huang
TL;DR
This paper tackles the vulnerability of LoRA fine-tuning for Latent Diffusion Models to ACE/ACE+ adversarial attacks. It proposes Low-Rank Defense (LoRD), a two-branch LoRA scheme with a learnable balance parameter that merges a defense-focused branch with the original LoRA update, enabling robust fine-tuning on both clean and attacked samples, expressed as $\lambda = \sigma(MLP((\alpha/r)BAx))$ and $h = Wx + (\alpha/r)BAx + \lambda(\alpha/r)B'Ax$. The defense is deployed through a two-stage pipeline: Stage-1 learns LoRD via adversarial training on clean and perturbed images, and Stage-2 fine-tunes LoRA with LoRD merged into the LDM, followed by testing with merged weights to preserve image quality. Empirical results on face and landscape datasets show LoRD outperforms baselines in both perceptual quality (CLIP-IQA) and fidelity (FID), demonstrating practical robustness against diffusion-model adversarial manipulation.
Abstract
Recently, adversarial attacks for diffusion models as well as their fine-tuning process have been developed rapidly. To prevent the abuse of these attack algorithms from affecting the practical application of diffusion models, it is critical to develop corresponding defensive strategies. In this work, we propose an efficient defensive strategy, named Low-Rank Defense (LoRD), to defend the adversarial attack on Latent Diffusion Models (LDMs). LoRD introduces the merging idea and a balance parameter, combined with the low-rank adaptation (LoRA) modules, to detect and defend the adversarial samples. Based on LoRD, we build up a defense pipeline that applies the learned LoRD modules to help diffusion models defend against attack algorithms. Our method ensures that the LDM fine-tuned on both adversarial and clean samples can still generate high-quality images. To demonstrate the effectiveness of our approach, we conduct extensive experiments on facial and landscape images, and our method shows significantly better defense performance compared to the baseline methods.
