Table of Contents
Fetching ...

CamoFA: A Learnable Fourier-based Augmentation for Camouflage Segmentation

Minh-Quan Le, Minh-Triet Tran, Trung-Nghia Le, Tam V. Nguyen, Thanh-Toan Do

TL;DR

This work addresses the challenge of camouflaged object detection and instance segmentation by introducing CamoFA, a learnable Fourier-based augmentation that operates in the frequency domain. The method jointly trains a conditional GAN to generate a reference image and employs a cross-attention module to align this reference with the input, followed by an adaptive low-frequency/high-frequency amplitude swap in the Fourier domain to make camouflaged regions more prominent. The authors provide a three-fold contribution: (1) a novel augmentation framework that can plug into existing COD/CIS models, (2) an adaptive hybrid swapping mechanism controlled by a learnable parameter, and (3) a cross-attention strategy that enables context-aware transfer of texture and color. Extensive experiments on COD and CIS benchmarks show substantial performance gains over state-of-the-art methods and over generic augmentations, highlighting the practical impact of frequency-domain learning for camouflage-sensitive vision tasks.

Abstract

Camouflaged object detection (COD) and camouflaged instance segmentation (CIS) aim to recognize and segment objects that are blended into their surroundings, respectively. While several deep neural network models have been proposed to tackle those tasks, augmentation methods for COD and CIS have not been thoroughly explored. Augmentation strategies can help improve models' performance by increasing the size and diversity of the training data and exposing the model to a wider range of variations in the data. Besides, we aim to automatically learn transformations that help to reveal the underlying structure of camouflaged objects and allow the model to learn to better identify and segment camouflaged objects. To achieve this, we propose a learnable augmentation method in the frequency domain for COD and CIS via the Fourier transform approach, dubbed CamoFA. Our method leverages a conditional generative adversarial network and cross-attention mechanism to generate a reference image and an adaptive hybrid swapping with parameters to mix the low-frequency component of the reference image and the high-frequency component of the input image. This approach aims to make camouflaged objects more visible for detection and segmentation models. Without bells and whistles, our proposed augmentation method boosts the performance of camouflaged object detectors and instance segmenters by large margins.

CamoFA: A Learnable Fourier-based Augmentation for Camouflage Segmentation

TL;DR

This work addresses the challenge of camouflaged object detection and instance segmentation by introducing CamoFA, a learnable Fourier-based augmentation that operates in the frequency domain. The method jointly trains a conditional GAN to generate a reference image and employs a cross-attention module to align this reference with the input, followed by an adaptive low-frequency/high-frequency amplitude swap in the Fourier domain to make camouflaged regions more prominent. The authors provide a three-fold contribution: (1) a novel augmentation framework that can plug into existing COD/CIS models, (2) an adaptive hybrid swapping mechanism controlled by a learnable parameter, and (3) a cross-attention strategy that enables context-aware transfer of texture and color. Extensive experiments on COD and CIS benchmarks show substantial performance gains over state-of-the-art methods and over generic augmentations, highlighting the practical impact of frequency-domain learning for camouflage-sensitive vision tasks.

Abstract

Camouflaged object detection (COD) and camouflaged instance segmentation (CIS) aim to recognize and segment objects that are blended into their surroundings, respectively. While several deep neural network models have been proposed to tackle those tasks, augmentation methods for COD and CIS have not been thoroughly explored. Augmentation strategies can help improve models' performance by increasing the size and diversity of the training data and exposing the model to a wider range of variations in the data. Besides, we aim to automatically learn transformations that help to reveal the underlying structure of camouflaged objects and allow the model to learn to better identify and segment camouflaged objects. To achieve this, we propose a learnable augmentation method in the frequency domain for COD and CIS via the Fourier transform approach, dubbed CamoFA. Our method leverages a conditional generative adversarial network and cross-attention mechanism to generate a reference image and an adaptive hybrid swapping with parameters to mix the low-frequency component of the reference image and the high-frequency component of the input image. This approach aims to make camouflaged objects more visible for detection and segmentation models. Without bells and whistles, our proposed augmentation method boosts the performance of camouflaged object detectors and instance segmenters by large margins.
Paper Structure (15 sections, 8 equations, 5 figures, 6 tables)

This paper contains 15 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Our CamoFA not only preserves the spatial structure of an image but also highlights the underlying structure of camouflaged objects for better identification and segmentation.
  • Figure 2: Overview of the proposed CamoFA. Our method leverages a conditional generative adversarial network and cross-attention mechanism to generate a reference image and an adaptive hybrid swapping with parameters to mix the low-frequency component of the reference image and the high-frequency component of the input image.
  • Figure 3: Visualization of augmented images by our CamoFA. Our proposed augmentation highlights the underlying structure of camouflaged objects for better identification and segmentation. We also compare the transformed results of our method with and without adaptive hybrid swapping. Our adaptive hybrid swapping can control the amount of texture and color information that is transferred from the reference image to the input image.
  • Figure 4: Qualitative comparison of SINetV2 with and without our proposed CamoFA in COD task.
  • Figure 5: Qualitative comparison of OSFormer with and without our proposed CamoFA in CIS task.