Table of Contents
Fetching ...

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches

Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique

TL;DR

ODDR introduces a three-stage, model-agnostic defense against patch-based adversarial attacks by first fragmenting images, then detecting outlier-bearing fragments with an Isolation Forest, and finally neutralizing the detected regions via localized dimension reduction using truncated SVD. The method preserves essential information while suppressing adversarial influence, achieving substantial robustness gains across image classification, object detection, and monocular depth estimation with competitive or superior performance to state-of-the-art defenses and reasonable computational cost. Theoretical grounding combines outlier detection with principled dimensionality reduction, and extensive experiments demonstrate strong robustness against multiple patch families, including naturalistic patches, under a white-box threat model. The work also provides ablations, interpretability insights via Grad-CAM, and a discussion of adaptability to adaptive attacks, highlighting practical utility for securing vision systems in real-world settings.

Abstract

Adversarial attacks present a significant challenge to the dependable deployment of machine learning models, with patch-based attacks being particularly potent. These attacks introduce adversarial perturbations in localized regions of an image, deceiving even well-trained models. In this paper, we propose Outlier Detection and Dimension Reduction (ODDR), a comprehensive defense strategy engineered to counteract patch-based adversarial attacks through advanced statistical methodologies. Our approach is based on the observation that input features corresponding to adversarial patches-whether naturalistic or synthetic-deviate from the intrinsic distribution of the remaining image data and can thus be identified as outliers. ODDR operates through a robust three-stage pipeline: Fragmentation, Segregation, and Neutralization. This model-agnostic framework is versatile, offering protection across various tasks, including image classification, object detection, and depth estimation, and is proved effective in both CNN-based and Transformer-based architectures. In the Fragmentation stage, image samples are divided into smaller segments, preparing them for the Segregation stage, where advanced outlier detection techniques isolate anomalous features linked to adversarial perturbations. The Neutralization stage then applies dimension reduction techniques to these outliers, effectively neutralizing the adversarial impact while preserving critical information for the machine learning task. Extensive evaluation on benchmark datasets against state-of-the-art adversarial patches underscores the efficacy of ODDR. Our method enhances model accuracy from 39.26% to 79.1% under the GoogleAp attack, outperforming leading defenses such as LGS (53.86%), Jujutsu (60%), and Jedi (64.34%).

ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches

TL;DR

ODDR introduces a three-stage, model-agnostic defense against patch-based adversarial attacks by first fragmenting images, then detecting outlier-bearing fragments with an Isolation Forest, and finally neutralizing the detected regions via localized dimension reduction using truncated SVD. The method preserves essential information while suppressing adversarial influence, achieving substantial robustness gains across image classification, object detection, and monocular depth estimation with competitive or superior performance to state-of-the-art defenses and reasonable computational cost. Theoretical grounding combines outlier detection with principled dimensionality reduction, and extensive experiments demonstrate strong robustness against multiple patch families, including naturalistic patches, under a white-box threat model. The work also provides ablations, interpretability insights via Grad-CAM, and a discussion of adaptability to adaptive attacks, highlighting practical utility for securing vision systems in real-world settings.

Abstract

Adversarial attacks present a significant challenge to the dependable deployment of machine learning models, with patch-based attacks being particularly potent. These attacks introduce adversarial perturbations in localized regions of an image, deceiving even well-trained models. In this paper, we propose Outlier Detection and Dimension Reduction (ODDR), a comprehensive defense strategy engineered to counteract patch-based adversarial attacks through advanced statistical methodologies. Our approach is based on the observation that input features corresponding to adversarial patches-whether naturalistic or synthetic-deviate from the intrinsic distribution of the remaining image data and can thus be identified as outliers. ODDR operates through a robust three-stage pipeline: Fragmentation, Segregation, and Neutralization. This model-agnostic framework is versatile, offering protection across various tasks, including image classification, object detection, and depth estimation, and is proved effective in both CNN-based and Transformer-based architectures. In the Fragmentation stage, image samples are divided into smaller segments, preparing them for the Segregation stage, where advanced outlier detection techniques isolate anomalous features linked to adversarial perturbations. The Neutralization stage then applies dimension reduction techniques to these outliers, effectively neutralizing the adversarial impact while preserving critical information for the machine learning task. Extensive evaluation on benchmark datasets against state-of-the-art adversarial patches underscores the efficacy of ODDR. Our method enhances model accuracy from 39.26% to 79.1% under the GoogleAp attack, outperforming leading defenses such as LGS (53.86%), Jujutsu (60%), and Jedi (64.34%).
Paper Structure (36 sections, 6 equations, 9 figures, 11 tables, 3 algorithms)

This paper contains 36 sections, 6 equations, 9 figures, 11 tables, 3 algorithms.

Figures (9)

  • Figure 1: Overview of the Proposed ODDR Defense Methodology: The three-stage pipeline—Fragmentation, Segregation, and Neutralization—demonstrating the process of identifying and mitigating adversarial patches in the input image features.
  • Figure 2: Illustration on the impact of ODDR on depth SSAP-based attack guesmi2024ssap.
  • Figure 3: Illustration of the impact of ODDR on AdvYOLO patch-based attacks thys2019: Top row: Adversarial images. Bottom row: Defended images for two datasets: a) CASIA representing indoor settings, and b) INRIA representing outdoor settings.
  • Figure 4: Grad-CAM visualization result for ODDR in action.
  • Figure 5: Illustration of the impact of ODDR on Naturalistic patch-based attacks Hu21: Top row: Adversarial images. Bottom row: Defended images for two datasets: a) CASIA representing indoor settings, and b) INRIA representing outdoor settings.
  • ...and 4 more figures