Table of Contents
Fetching ...

An Instance-Aware Prompting Framework for Training-free Camouflaged Object Segmentation

Chao Yin, Jide Li, Hang Yao, Xiaoqiang Li

TL;DR

The paper tackles training-free Camouflaged Object Segmentation (COS) by addressing the limitation of semantic prompts in existing pipelines. It introduces IAPF, which converts a single task-generic prompt into instance-level visual prompts via a detector-agnostic box enumerator, Single-Foreground Multi-Background Prompting (SFMBP) with Spatial CLIP heatmaps, and a self-consistency voting mechanism, all using frozen components. The approach yields accurate, instance-discriminative masks for multi-object camouflage and achieves state-of-the-art results among training-free methods, with strong performance on CIS benchmarks and downstream RGB-only tasks. This instance-aware prompting framework offers a practical, zero-shot solution with robust cross-domain generalization for promptable segmentation in complex scenes.

Abstract

Training-free Camouflaged Object Segmentation (COS) seeks to segment camouflaged objects without task-specific training, by automatically generating visual prompts to guide the Segment Anything Model (SAM). However, existing pipelines mostly yield semantic-level prompts, which drive SAM to coarse semantic masks and struggle to handle multiple discrete camouflaged instances effectively. To address this critical limitation, we propose an \textbf{I}nstance-\textbf{A}ware \textbf{P}rompting \textbf{F}ramework (IAPF) tailored for the first training-free COS that upgrades prompt granularity from semantic to instance-level while keeping all components frozen. The centerpiece is an Instance Mask Generator that (i) leverages a detector-agnostic enumerator to produce precise instance-level box prompts for the foreground tag, and (ii) introduces the Single-Foreground Multi-Background Prompting (SFMBP) strategy to sample region-constrained point prompts within each box prompt, enabling SAM to output instance masks. The pipeline is supported by a simple text prompt generator that produces image-specific tags and a self-consistency vote across synonymous task-generic prompts to stabilize inference. Extensive evaluations on three COS benchmarks, two CIS benchmarks, and two downstream datasets demonstrate state-of-the-art performance among training-free methods. Code will be released upon acceptance.

An Instance-Aware Prompting Framework for Training-free Camouflaged Object Segmentation

TL;DR

The paper tackles training-free Camouflaged Object Segmentation (COS) by addressing the limitation of semantic prompts in existing pipelines. It introduces IAPF, which converts a single task-generic prompt into instance-level visual prompts via a detector-agnostic box enumerator, Single-Foreground Multi-Background Prompting (SFMBP) with Spatial CLIP heatmaps, and a self-consistency voting mechanism, all using frozen components. The approach yields accurate, instance-discriminative masks for multi-object camouflage and achieves state-of-the-art results among training-free methods, with strong performance on CIS benchmarks and downstream RGB-only tasks. This instance-aware prompting framework offers a practical, zero-shot solution with robust cross-domain generalization for promptable segmentation in complex scenes.

Abstract

Training-free Camouflaged Object Segmentation (COS) seeks to segment camouflaged objects without task-specific training, by automatically generating visual prompts to guide the Segment Anything Model (SAM). However, existing pipelines mostly yield semantic-level prompts, which drive SAM to coarse semantic masks and struggle to handle multiple discrete camouflaged instances effectively. To address this critical limitation, we propose an \textbf{I}nstance-\textbf{A}ware \textbf{P}rompting \textbf{F}ramework (IAPF) tailored for the first training-free COS that upgrades prompt granularity from semantic to instance-level while keeping all components frozen. The centerpiece is an Instance Mask Generator that (i) leverages a detector-agnostic enumerator to produce precise instance-level box prompts for the foreground tag, and (ii) introduces the Single-Foreground Multi-Background Prompting (SFMBP) strategy to sample region-constrained point prompts within each box prompt, enabling SAM to output instance masks. The pipeline is supported by a simple text prompt generator that produces image-specific tags and a self-consistency vote across synonymous task-generic prompts to stabilize inference. Extensive evaluations on three COS benchmarks, two CIS benchmarks, and two downstream datasets demonstrate state-of-the-art performance among training-free methods. Code will be released upon acceptance.

Paper Structure

This paper contains 26 sections, 10 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Motivation of the proposed IAPF. (a) Visualization of box prompts generated by MLLM, MaxIOUBox, and our method. Only the instance-aware strategy yields multiple instance-level boxes. (b) Quantitative results of the COD10K dataset showing our superior box prompts accuracy, especially under a high IoU threshold.
  • Figure 2: Framework of the proposed IAPF, which consists of three steps: (1) Text Prompt Generator: an MLLM turns task-generic prompts and an image into foreground/background tags. (2) Instance Mask Generator (bottom): a detector-agnostic enumerator produces $N$ instance-level box prompts; SFMBP uses CLIP to form foreground/multi-background heatmaps and samples region-constrained per-box points; SAM converts $N$ box–points pairs into a candidate instance mask. (3) Self-consistency Instance Mask Voting: the candidate whose semantic projection is most consistent across repetitions is selected as the final COS prediction (its instance masks provide CIS).
  • Figure 3: Qualitative comparison of the proposed IAPF with three main training-free COS methods. From left to right, as the number of camouflaged instances in the scene increases, existing training-free methods fail to segment all objects. In contrast, the IAPF consistently produces high-quality instance masks, even when dealing with multiple camouflaged instances.
  • Figure 4: Downstream qualitative results on ACOD-12K and PlantCamo.