Contextual fusion enhances robustness to image blurring
Shruti Joshi, Aiswarya Akumalla, Seth Haney, Maxim Bazhenov
TL;DR
The paper addresses the vulnerability of single-modality CNNs to perceptible perturbations by introducing contextual fusion of foreground and background features. It implements three classifiers (foreground, background, and joint) using ResNet18 backbones trained on Imagenet and Places365, with late fusion to form a 1024‑D representation, and evaluates robustness on MS COCO and CIFAR-10 under Gaussian blur and FGSM attacks. Key findings show that the joint classifier provides robustness gains across blur levels and context variability, with the fusion weights revealing balanced contributions from both modalities; regularizing the known adversarial foreground improves performance beyond standard adversarial retraining. These results suggest a scalable, biologically inspired approach to enhance robustness by leveraging multimodal information, potentially informing future extensions to additional modalities and fusion architectures, and contributing to broader insights on cognitive robustness. $ ext{(Equations referenced: } \eta = \epsilon \cdot \text{sign}(\nabla_x J(\theta,x,y)) \text{ and } L(a_i,t_i) = - \sum_i^N t_i \log\frac{e^{a_i}}{\sum_j^C a_j} + \alpha |\theta_{fg}|^2 \text{)}$
Abstract
Mammalian brains handle complex reasoning by integrating information across brain regions specialized for particular sensory modalities. This enables improved robustness and generalization versus deep neural networks, which typically process one modality and are vulnerable to perturbations. While defense methods exist, they do not generalize well across perturbations. We developed a fusion model combining background and foreground features from CNNs trained on Imagenet and Places365. We tested its robustness to human-perceivable perturbations on MS COCO. The fusion model improved robustness, especially for classes with greater context variability. Our proposed solution for integrating multiple modalities provides a new approach to enhance robustness and may be complementary to existing methods.
