Table of Contents
Fetching ...

RefOnce: Distilling References into a Prototype Memory for Referring Camouflaged Object Detection

Yu-Huan Wu, Zi-Xuan Zhu, Yan Wang, Liangli Zhen, Deng-Ping Fan

TL;DR

RefOnce tackles Ref-COD by removing the need for test-time reference images and addressing the salient-to-camouflage domain gap. It distills reference knowledge into a class-prototype memory updated via EMA and synthesizes a query-conditioned guidance vector $\mathbf{v}$ through a soft mixture over prototypes with $\boldsymbol{\pi}=\mathrm{softmax}(\mathbf{a})$ and $\mathbf{v}=\sum_k \pi_k \mathbf{m}_k$, guided further by a Bidirectional Attention Alignment that jointly refines $\mathbf{X}$ and $\mathbf{v}$. The method achieves state-of-the-art results on the R2C7K benchmark and generalizes well to unseen categories, all while operating in a fully reference-free inference mode. This offers a practical, deployable Ref-COD solution with reduced data collection requirements and latency, suitable for real-world applications that demand automatic, category-aware camouflage detection.

Abstract

Referring Camouflaged Object Detection (Ref-COD) segments specified camouflaged objects in a scene by leveraging a small set of referring images. Though effective, current systems adopt a dual-branch design that requires reference images at test time, which limits deployability and adds latency and data-collection burden. We introduce a Ref-COD framework that distills references into a class-prototype memory during training and synthesizes a reference vector at inference via a query-conditioned mixture of prototypes. Concretely, we maintain an EMA-updated prototype per category and predict mixture weights from the query to produce a guidance vector without any test-time references. To bridge the representation gap between reference statistics and camouflaged query features, we propose a bidirectional attention alignment module that adapts both the query features and the class representation. Thus, our approach yields a simple, efficient path to Ref-COD without mandatory references. We evaluate the proposed method on the large-scale R2C7K benchmark. Extensive experiments demonstrate competitive or superior performance of the proposed method compared with recent state-of-the-arts. Code is available at https://github.com/yuhuan-wu/RefOnce.

RefOnce: Distilling References into a Prototype Memory for Referring Camouflaged Object Detection

TL;DR

RefOnce tackles Ref-COD by removing the need for test-time reference images and addressing the salient-to-camouflage domain gap. It distills reference knowledge into a class-prototype memory updated via EMA and synthesizes a query-conditioned guidance vector through a soft mixture over prototypes with and , guided further by a Bidirectional Attention Alignment that jointly refines and . The method achieves state-of-the-art results on the R2C7K benchmark and generalizes well to unseen categories, all while operating in a fully reference-free inference mode. This offers a practical, deployable Ref-COD solution with reduced data collection requirements and latency, suitable for real-world applications that demand automatic, category-aware camouflage detection.

Abstract

Referring Camouflaged Object Detection (Ref-COD) segments specified camouflaged objects in a scene by leveraging a small set of referring images. Though effective, current systems adopt a dual-branch design that requires reference images at test time, which limits deployability and adds latency and data-collection burden. We introduce a Ref-COD framework that distills references into a class-prototype memory during training and synthesizes a reference vector at inference via a query-conditioned mixture of prototypes. Concretely, we maintain an EMA-updated prototype per category and predict mixture weights from the query to produce a guidance vector without any test-time references. To bridge the representation gap between reference statistics and camouflaged query features, we propose a bidirectional attention alignment module that adapts both the query features and the class representation. Thus, our approach yields a simple, efficient path to Ref-COD without mandatory references. We evaluate the proposed method on the large-scale R2C7K benchmark. Extensive experiments demonstrate competitive or superior performance of the proposed method compared with recent state-of-the-arts. Code is available at https://github.com/yuhuan-wu/RefOnce.

Paper Structure

This paper contains 13 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Comparison between (a) existing Ref-COD approaches zhang2025referringwu2025uncertainty and (b) our RefOnce framework. RefOnce retrieves the reference from the prototype memory to enable reference-free inference.
  • Figure 2: Pipeline of the proposed RefOnce framework. During training, the reference branch (top, dashed) distills features into a Global Memory $\mathcal{M}$; at inference, the reference $\mathbf{v}$ is synthesized from $\mathcal{M}$ and predict logits.
  • Figure 3: Architecture of the bidirectional attention alignment.
  • Figure 4: Qualitative comparison on R2C7K dataset.
  • Figure 5: Visualizations of backbone features with or without reference guidance. "4) w/o Ref" indicates the visualization of backbone features before reference guidance. "5) w/ R2C" is the visualization of backbone features with R2C zhang2025referring as reference guidance. "6) w/ RefOnce (Ours)" replaces R2C with our BAA (Sec. \ref{['sec:alignment']}) strategy. More examples can refer to the supplementary.