No time to train! Training-Free Reference-Based Instance Segmentation

Miguel Espinosa; Chenhongyi Yang; Linus Ericsson; Steven McDonagh; Elliot J. Crowley

No time to train! Training-Free Reference-Based Instance Segmentation

Miguel Espinosa, Chenhongyi Yang, Linus Ericsson, Steven McDonagh, Elliot J. Crowley

TL;DR

This work tackles the scarcity of annotated segmentation data by introducing a training-free, reference-based instance segmentation method that leverages strong semantic priors from foundation models. A three-stage pipeline constructs a memory bank of class prototypes from reference images, aggregates features to robust prototypes, and performs cosine-based matching with semantic-aware merging on SAM-generated masks. The approach achieves state-of-the-art results on COCO-FSOD and PASCAL-VOC FSOD, and shows robust cross-domain generalization on CD-FSOD without any fine-tuning, while maintaining practical efficiency. The findings demonstrate that carefully engineered use of frozen models can deliver high-quality instance segmentation across diverse domains, with potential for broader semantic mapping beyond instances.

Abstract

The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP).

No time to train! Training-Free Reference-Based Instance Segmentation

TL;DR

Abstract

No time to train! Training-Free Reference-Based Instance Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (30)