CamoFormer: Masked Separable Attention for Camouflaged Object Detection
Bowen Yin, Xuying Zhang, Qibin Hou, Bo-Yuan Sun, Deng-Ping Fan, Luc Van Gool
TL;DR
CamoFormer tackles camouflaged object detection by explicitly modeling foreground and background cues with Masked Separable Attention and a progressive top-down decoder. By partitioning attention heads into foreground, background, and global groups and feeding soft predictions as masks, it achieves precise, boundary-aware segmentation. Across NC4K, COD10K, and CAMO, the method delivers state-of-the-art results with notable gains in S-measure and weighted F-measure, along with improved border quality. The approach demonstrates the effectiveness of masked, separable attention in binary segmentation and holds promise for broader binary segmentation applications.
Abstract
How to identify and segment camouflaged objects from the background is challenging. Inspired by the multi-head self-attention in Transformers, we present a simple masked separable attention (MSA) for camouflaged object detection. We first separate the multi-head self-attention into three parts, which are responsible for distinguishing the camouflaged objects from the background using different mask strategies. Furthermore, we propose to capture high-resolution semantic representations progressively based on a simple top-down decoder with the proposed MSA to attain precise segmentation results. These structures plus a backbone encoder form a new model, dubbed CamoFormer. Extensive experiments show that CamoFormer surpasses all existing state-of-the-art methods on three widely-used camouflaged object detection benchmarks. There are on average around 5% relative improvements over previous methods in terms of S-measure and weighted F-measure.
