Table of Contents
Fetching ...

Contextual fusion enhances robustness to image blurring

Shruti Joshi, Aiswarya Akumalla, Seth Haney, Maxim Bazhenov

TL;DR

The paper addresses the vulnerability of single-modality CNNs to perceptible perturbations by introducing contextual fusion of foreground and background features. It implements three classifiers (foreground, background, and joint) using ResNet18 backbones trained on Imagenet and Places365, with late fusion to form a 1024‑D representation, and evaluates robustness on MS COCO and CIFAR-10 under Gaussian blur and FGSM attacks. Key findings show that the joint classifier provides robustness gains across blur levels and context variability, with the fusion weights revealing balanced contributions from both modalities; regularizing the known adversarial foreground improves performance beyond standard adversarial retraining. These results suggest a scalable, biologically inspired approach to enhance robustness by leveraging multimodal information, potentially informing future extensions to additional modalities and fusion architectures, and contributing to broader insights on cognitive robustness. $ ext{(Equations referenced: } \eta = \epsilon \cdot \text{sign}(\nabla_x J(\theta,x,y)) \text{ and } L(a_i,t_i) = - \sum_i^N t_i \log\frac{e^{a_i}}{\sum_j^C a_j} + \alpha |\theta_{fg}|^2 \text{)}$

Abstract

Mammalian brains handle complex reasoning by integrating information across brain regions specialized for particular sensory modalities. This enables improved robustness and generalization versus deep neural networks, which typically process one modality and are vulnerable to perturbations. While defense methods exist, they do not generalize well across perturbations. We developed a fusion model combining background and foreground features from CNNs trained on Imagenet and Places365. We tested its robustness to human-perceivable perturbations on MS COCO. The fusion model improved robustness, especially for classes with greater context variability. Our proposed solution for integrating multiple modalities provides a new approach to enhance robustness and may be complementary to existing methods.

Contextual fusion enhances robustness to image blurring

TL;DR

The paper addresses the vulnerability of single-modality CNNs to perceptible perturbations by introducing contextual fusion of foreground and background features. It implements three classifiers (foreground, background, and joint) using ResNet18 backbones trained on Imagenet and Places365, with late fusion to form a 1024‑D representation, and evaluates robustness on MS COCO and CIFAR-10 under Gaussian blur and FGSM attacks. Key findings show that the joint classifier provides robustness gains across blur levels and context variability, with the fusion weights revealing balanced contributions from both modalities; regularizing the known adversarial foreground improves performance beyond standard adversarial retraining. These results suggest a scalable, biologically inspired approach to enhance robustness by leveraging multimodal information, potentially informing future extensions to additional modalities and fusion architectures, and contributing to broader insights on cognitive robustness.

Abstract

Mammalian brains handle complex reasoning by integrating information across brain regions specialized for particular sensory modalities. This enables improved robustness and generalization versus deep neural networks, which typically process one modality and are vulnerable to perturbations. While defense methods exist, they do not generalize well across perturbations. We developed a fusion model combining background and foreground features from CNNs trained on Imagenet and Places365. We tested its robustness to human-perceivable perturbations on MS COCO. The fusion model improved robustness, especially for classes with greater context variability. Our proposed solution for integrating multiple modalities provides a new approach to enhance robustness and may be complementary to existing methods.
Paper Structure (15 sections, 2 equations, 5 figures)

This paper contains 15 sections, 2 equations, 5 figures.

Figures (5)

  • Figure 1: Cartoon of architecture: foreground, background, and joint classifiers.
  • Figure 2: Effect of object blurring on background and foreground features. PCA projection to 2D-space is shown. Small and bright colored dots represent raw images while large and dark colored dots represent blurred images. Left, entire subspace of the foreground features moved (up in this example) in presence of blur. Right, after application of blur background features remain within the statistical subspace created by raw images. Each color represents a single image class. A filter with Gauss kernel and $\sigma=5$ was used to blur images.
  • Figure 3: Effect of Gaussian Blur on classification performance for MS COCO data. Panels a,c,d show classification performance for different levels of blur ($\sigma$). ‘All’ refers to the 24 classes from different supercategories that remain after downsizing the dataset. ‘Dissimilar’ and ‘Similar’ are eight classes randomly selected from the 24 classes such that the supercategories are either distinct or the same (see Methodology). b) Example of high Gaussian blur ($\sigma$ = 45) on foreground pixels within bounding box.
  • Figure 4: Effect of FGSM on MS COCO and CIFAR-10. a,c) FGSM attack on foreground classifier of 'All' categories from MS COCO (a) and CIFAR-10 (c). $\epsilon$ indicates strength of attack. b,d) Average absolute value of the weights from the last layers of the foreground and background networks to the joint classifier for MS COCO (b) and CIFAR-10 (d).
  • Figure 5: Regularization on the foreground weights of joint classifier. a)-b) Varying values of $\alpha$ on MS COCO Dissimilar and All categories are shown for the joint classifier in green. The range of $\alpha$ was between 0.1 and 10. Darker green colors indicate higher levels of $\alpha$. Examples for adversarial retraining were generated with an attack strength $\epsilon$ =0.3. c)-d) Average absolute value of foreground and background weights as a function of $\alpha$