Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual Recognition

Fadi Dornaika; Danyang Sun

Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual Recognition

Fadi Dornaika, Danyang Sun

TL;DR

LGCOAMix addresses limitations of traditional cutmix by introducing a superpixel-based grid mixing mechanism and a semantic, attention-guided label mixing strategy that preserves object-part information with a single forward pass. The approach combines superpixel pooling, self-attention, and discriminative superpixel selection to enable both global classification improvements and targeted local learning, complemented by cross-image contrastive supervision. Empirical results across diverse datasets and backbones show consistent improvements over state-of-the-art cutmix variants and even strong WSOL performance, with demonstrated applicability to both CNN and Transformer architectures. The method achieves strong generalization with efficient inference, making it practical for real-world visual recognition tasks.

Abstract

Cutmix-based data augmentation, which uses a cut-and-paste strategy, has shown remarkable generalization capabilities in deep learning. However, existing methods primarily consider global semantics with image-level constraints, which excessively reduces attention to the discriminative local context of the class and leads to a performance improvement bottleneck. Moreover, existing methods for generating augmented samples usually involve cutting and pasting rectangular or square regions, resulting in a loss of object part information. To mitigate the problem of inconsistency between the augmented image and the generated mixed label, existing methods usually require double forward propagation or rely on an external pre-trained network for object centering, which is inefficient. To overcome the above limitations, we propose LGCOAMix, an efficient context-aware and object-part-aware superpixel-based grid blending method for data augmentation. To the best of our knowledge, this is the first time that a label mixing strategy using a superpixel attention approach has been proposed for cutmix-based data augmentation. It is the first instance of learning local features from discriminative superpixel-wise regions and cross-image superpixel contrasts. Extensive experiments on various benchmark datasets show that LGCOAMix outperforms state-of-the-art cutmix-based data augmentation methods on classification tasks, {and weakly supervised object location on CUB200-2011.} We have demonstrated the effectiveness of LGCOAMix not only for CNN networks, but also for Transformer networks. Source codes are available at https://github.com/DanielaPlusPlus/LGCOAMix.

Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual Recognition

TL;DR

Abstract

Local and Global Context-and-Object-part-Aware Superpixel-based Data Augmentation for Deep Visual Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)