CamoFA: A Learnable Fourier-based Augmentation for Camouflage Segmentation
Minh-Quan Le, Minh-Triet Tran, Trung-Nghia Le, Tam V. Nguyen, Thanh-Toan Do
TL;DR
This work addresses the challenge of camouflaged object detection and instance segmentation by introducing CamoFA, a learnable Fourier-based augmentation that operates in the frequency domain. The method jointly trains a conditional GAN to generate a reference image and employs a cross-attention module to align this reference with the input, followed by an adaptive low-frequency/high-frequency amplitude swap in the Fourier domain to make camouflaged regions more prominent. The authors provide a three-fold contribution: (1) a novel augmentation framework that can plug into existing COD/CIS models, (2) an adaptive hybrid swapping mechanism controlled by a learnable parameter, and (3) a cross-attention strategy that enables context-aware transfer of texture and color. Extensive experiments on COD and CIS benchmarks show substantial performance gains over state-of-the-art methods and over generic augmentations, highlighting the practical impact of frequency-domain learning for camouflage-sensitive vision tasks.
Abstract
Camouflaged object detection (COD) and camouflaged instance segmentation (CIS) aim to recognize and segment objects that are blended into their surroundings, respectively. While several deep neural network models have been proposed to tackle those tasks, augmentation methods for COD and CIS have not been thoroughly explored. Augmentation strategies can help improve models' performance by increasing the size and diversity of the training data and exposing the model to a wider range of variations in the data. Besides, we aim to automatically learn transformations that help to reveal the underlying structure of camouflaged objects and allow the model to learn to better identify and segment camouflaged objects. To achieve this, we propose a learnable augmentation method in the frequency domain for COD and CIS via the Fourier transform approach, dubbed CamoFA. Our method leverages a conditional generative adversarial network and cross-attention mechanism to generate a reference image and an adaptive hybrid swapping with parameters to mix the low-frequency component of the reference image and the high-frequency component of the input image. This approach aims to make camouflaged objects more visible for detection and segmentation models. Without bells and whistles, our proposed augmentation method boosts the performance of camouflaged object detectors and instance segmenters by large margins.
