Table of Contents
Fetching ...

TFCounter:Polishing Gems for Training-Free Object Counting

Pan Ting, Jianfeng Lin, Wenhao Yu, Wenlong Zhang, Xiaoying Chen, Jinlu Zhang, Binqiang Huang

TL;DR

The paper tackles the challenge of training-free, class-agnostic object counting with limited annotation by introducing TFCounter, a prompt-context-aware framework built on foundation-model priors. It combines a SAM-based segmentation backbone, a context-aware similarity module, and a dual-prompt counting mechanism within an iterative counting scheme, enabling broad object recall and improved precision. The authors validate their approach on FSC147, CARPK, and the newly introduced BIKE-1000 dataset, showing that TFCounter outperforms existing training-free methods and is competitive with trained models. This work demonstrates how to leverage foundation-model components to achieve robust cross-domain counting with reduced annotation effort, and points to future directions in interactive prompts and adaptive similarity design.

Abstract

Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis. Existing object counting methods face a tri-fold challenge: achieving superior performance, maintaining high generalizability, and minimizing annotation costs. We develop a novel training-free class-agnostic object counter, TFCounter, which is prompt-context-aware via the cascade of the essential elements in large-scale foundation models. This approach employs an iterative counting framework with a dual prompt system to recognize a broader spectrum of objects varying in shape, appearance, and size. Besides, it introduces an innovative context-aware similarity module incorporating background context to enhance accuracy within messy scenes. To demonstrate cross-domain generalizability, we collect a novel counting dataset named BIKE-1000, including exclusive 1000 images of shared bicycles from Meituan. Extensive experiments on FSC-147, CARPK, and BIKE-1000 datasets demonstrate that TFCounter outperforms existing leading training-free methods and exhibits competitive results compared to trained counterparts.

TFCounter:Polishing Gems for Training-Free Object Counting

TL;DR

The paper tackles the challenge of training-free, class-agnostic object counting with limited annotation by introducing TFCounter, a prompt-context-aware framework built on foundation-model priors. It combines a SAM-based segmentation backbone, a context-aware similarity module, and a dual-prompt counting mechanism within an iterative counting scheme, enabling broad object recall and improved precision. The authors validate their approach on FSC147, CARPK, and the newly introduced BIKE-1000 dataset, showing that TFCounter outperforms existing training-free methods and is competitive with trained models. This work demonstrates how to leverage foundation-model components to achieve robust cross-domain counting with reduced annotation effort, and points to future directions in interactive prompts and adaptive similarity design.

Abstract

Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis. Existing object counting methods face a tri-fold challenge: achieving superior performance, maintaining high generalizability, and minimizing annotation costs. We develop a novel training-free class-agnostic object counter, TFCounter, which is prompt-context-aware via the cascade of the essential elements in large-scale foundation models. This approach employs an iterative counting framework with a dual prompt system to recognize a broader spectrum of objects varying in shape, appearance, and size. Besides, it introduces an innovative context-aware similarity module incorporating background context to enhance accuracy within messy scenes. To demonstrate cross-domain generalizability, we collect a novel counting dataset named BIKE-1000, including exclusive 1000 images of shared bicycles from Meituan. Extensive experiments on FSC-147, CARPK, and BIKE-1000 datasets demonstrate that TFCounter outperforms existing leading training-free methods and exhibits competitive results compared to trained counterparts.
Paper Structure (21 sections, 5 equations, 8 figures, 4 tables)

This paper contains 21 sections, 5 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Integrating task-specific frameworks with generalizable components from large-scale foundation models can achieve training-free class-agnostic object counting by detailed structural design.
  • Figure 2: Overview of our TFCounter. TFCounter is a segmentation-based model designed for training-free, class-agnostic object counting. It employs an iterative counting mechanism and links three key modules: feature encoding, context-aware similarity computation, and prompt-aware object counting.
  • Figure 3: The context-aware similarity module utilizes the image embedding $\mathbf{F} _{\mathbf{I}}$ and all foreground masks $\{fmask_1, fmask_2, ... , fmask_N\}$ to generate both foreground and background similarity maps. The prompt-aware counting module performs weighted fusion on these similarity maps and operates a dual prompt system to produce target masks.
  • Figure 4: Few annotated images from BIKE-1000. Dot and box annotations are indicated in red and green, respectively. Most images feature an oblique perspective, leading to bicycles with considerable variations in shapes, appearances, and sizes, even instances of occlusion.
  • Figure 5: Number of images in several ranges of object count.
  • ...and 3 more figures