A Simple-but-effective Baseline for Training-free Class-Agnostic Counting
Yuhao Lin, Haiming Xu, Lingqiao Liu, Javen Qinfeng Shi
TL;DR
The paper tackles Class-Agnostic Counting (CAC) in a training-free setting by leveraging the Segment Anything Model (SAM) and four key technologies to close the performance gap with training-based CAC. It introduces superpixel-guided prompts, semantic-rich feature representations, a multiscale segmentation strategy, and a transductive prototype updating scheme, each contributing to improved object recall and discrimination without additional training. Empirical results on FSC-147 and CARPK show substantial improvements over prior training-free methods and competitive performance relative to trained approaches, effectively narrowing the training-free versus training-based gap. The work delivers a strong, training-free baseline for CAC and offers practical insights for deploying SAM-based counting in diverse, data-scarce scenarios.
Abstract
Class-Agnostic Counting (CAC) seeks to accurately count objects in a given image with only a few reference examples. While previous methods achieving this relied on additional training, recent efforts have shown that it's possible to accomplish this without training by utilizing pre-existing foundation models, particularly the Segment Anything Model (SAM), for counting via instance-level segmentation. Although promising, current training-free methods still lag behind their training-based counterparts in terms of performance. In this research, we present a straightforward training-free solution that effectively bridges this performance gap, serving as a strong baseline. The primary contribution of our work lies in the discovery of four key technologies that can enhance performance. Specifically, we suggest employing a superpixel algorithm to generate more precise initial point prompts, utilizing an image encoder with richer semantic knowledge to replace the SAM encoder for representing candidate objects, and adopting a multiscale mechanism and a transductive prototype scheme to update the representation of reference examples. By combining these four technologies, our approach achieves significant improvements over existing training-free methods and delivers performance on par with training-based ones.
