A Revisit to the Decoder for Camouflaged Object Detection

Seung Woo Ko; Joopyo Hong; Suyoung Kim; Seungjai Bang; Sungzoon Cho; Nojun Kwak; Hyung-Sin Kim; Joonseok Lee

A Revisit to the Decoder for Camouflaged Object Detection

Seung Woo Ko, Joopyo Hong, Suyoung Kim, Seungjai Bang, Sungzoon Cho, Nojun Kwak, Hyung-Sin Kim, Joonseok Lee

TL;DR

This work targets camouflaged object detection by redesigning the decoder with two auxiliary components: Enrich Decoder, which uses channel-wise attention to emphasize COD-relevant features and fuses multi-scale information, and Retouch Decoder, which applies spatial attention to refine object boundaries after decoding. The ENTO architecture sandwichs a base decoder with pre- and post-processing decoders, enabling high-resolution feature utilization and finer boundary delineation while remaining compatible with various encoders, including Transformers. Training supervises coarse outputs from Enrich as well as final outputs from the base and Retouch decoders using pixel-weighted BCE and IOU losses, guiding stepwise refinement. Empirically, ENTO achieves state-of-the-art performance on COD10K, CAMO, and NC4K datasets and demonstrates strong adaptability across encoder backbones, delivering superior boundary accuracy and detail with a compact decoder footprint.

Abstract

Camouflaged object detection (COD) aims to generate a fine-grained segmentation map of camouflaged objects hidden in their background. Due to the hidden nature of camouflaged objects, it is essential for the decoder to be tailored to effectively extract proper features of camouflaged objects and extra-carefully generate their complex boundaries. In this paper, we propose a novel architecture that augments the prevalent decoding strategy in COD with Enrich Decoder and Retouch Decoder, which help to generate a fine-grained segmentation map. Specifically, the Enrich Decoder amplifies the channels of features that are important for COD using channel-wise attention. Retouch Decoder further refines the segmentation maps by spatially attending to important pixels, such as the boundary regions. With extensive experiments, we demonstrate that ENTO shows superior performance using various encoders, with the two novel components playing their unique roles that are mutually complementary.

A Revisit to the Decoder for Camouflaged Object Detection

TL;DR

Abstract

A Revisit to the Decoder for Camouflaged Object Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)