RefOnce: Distilling References into a Prototype Memory for Referring Camouflaged Object Detection
Yu-Huan Wu, Zi-Xuan Zhu, Yan Wang, Liangli Zhen, Deng-Ping Fan
TL;DR
RefOnce tackles Ref-COD by removing the need for test-time reference images and addressing the salient-to-camouflage domain gap. It distills reference knowledge into a class-prototype memory updated via EMA and synthesizes a query-conditioned guidance vector $\mathbf{v}$ through a soft mixture over prototypes with $\boldsymbol{\pi}=\mathrm{softmax}(\mathbf{a})$ and $\mathbf{v}=\sum_k \pi_k \mathbf{m}_k$, guided further by a Bidirectional Attention Alignment that jointly refines $\mathbf{X}$ and $\mathbf{v}$. The method achieves state-of-the-art results on the R2C7K benchmark and generalizes well to unseen categories, all while operating in a fully reference-free inference mode. This offers a practical, deployable Ref-COD solution with reduced data collection requirements and latency, suitable for real-world applications that demand automatic, category-aware camouflage detection.
Abstract
Referring Camouflaged Object Detection (Ref-COD) segments specified camouflaged objects in a scene by leveraging a small set of referring images. Though effective, current systems adopt a dual-branch design that requires reference images at test time, which limits deployability and adds latency and data-collection burden. We introduce a Ref-COD framework that distills references into a class-prototype memory during training and synthesizes a reference vector at inference via a query-conditioned mixture of prototypes. Concretely, we maintain an EMA-updated prototype per category and predict mixture weights from the query to produce a guidance vector without any test-time references. To bridge the representation gap between reference statistics and camouflaged query features, we propose a bidirectional attention alignment module that adapts both the query features and the class representation. Thus, our approach yields a simple, efficient path to Ref-COD without mandatory references. We evaluate the proposed method on the large-scale R2C7K benchmark. Extensive experiments demonstrate competitive or superior performance of the proposed method compared with recent state-of-the-arts. Code is available at https://github.com/yuhuan-wu/RefOnce.
