A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion

Trinath Sai Subhash Reddy Pittala; Uma Maheswara Rao Meleti; Geethakrishna Puligundla

A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion

Trinath Sai Subhash Reddy Pittala, Uma Maheswara Rao Meleti, Geethakrishna Puligundla

TL;DR

The paper critiques the AI Guardian defense for relying on adversarial examples and a unidirectional threat model, proposing a diffusion-based defense using Stable Diffusion to achieve broader adversarial robustness without training on adversarial data. It details the theoretical basis, Stable Diffusion training and sampling loops, and evaluation against white-box and black-box PGD and FGSM attacks. Experimental results indicate that diffusion-based refinement substantially reduces attack success rates (e.g., from 90.8% to 4.2% for white-box PGD and from 71.7% to 8.8% for white-box FGSM), suggesting improved resilience across attack directions. The approach offers a scalable, dynamic defense with practical implications for enhancing AI security in adversarial settings, supported by executable artifacts at the provided repository.

Abstract

Recent developments in adversarial machine learning have highlighted the importance of building robust AI systems to protect against increasingly sophisticated attacks. While frameworks like AI Guardian are designed to defend against these threats, they often rely on assumptions that can limit their effectiveness. For example, they may assume attacks only come from one direction or include adversarial images in their training data. Our proposal suggests a different approach to the AI Guardian framework. Instead of including adversarial examples in the training process, we propose training the AI system without them. This aims to create a system that is inherently resilient to a wider range of attacks. Our method focuses on a dynamic defense strategy using stable diffusion that learns continuously and models threats comprehensively. We believe this approach can lead to a more generalized and robust defense against adversarial attacks. In this paper, we outline our proposed approach, including the theoretical basis, experimental design, and expected impact on improving AI security against adversarial threats.

A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion

TL;DR

Abstract

Paper Structure (30 sections, 24 figures, 1 table)

This paper contains 30 sections, 24 figures, 1 table.

Introduction
Related Work to AI Guardian
Trapdoor Defense Mechanism
Online Defense Mechanisms Against Adversarial Attacks
JPEG Compression and Total Variance Minimization
DISCO and Denoiser
Latent Intrinsic Dimensionality and Argos
Morphence
Adversarial Training Approaches
Method
Evaluation and Reproduced Results
Limitations of AI-Guardian
Proposed Approach
Stable Diffusion
Training
...and 15 more sections

Figures (24)

Figure 1: Backdoor Robustness Enhancement
Figure 2: It is the loss function for the Backdoor Algorithm. The hyperparameter is set to change dynamically after a few batches depending on trigger data and original data resistance to noise.
Figure 3: Accuracy without Defense
Figure 4: Accuracy with Defense
Figure 5: Stable Diffusion
...and 19 more figures

A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion

TL;DR

Abstract

A Novel Approach to Guard from Adversarial Attacks using Stable Diffusion

Authors

TL;DR

Abstract

Table of Contents

Figures (24)