Table of Contents
Fetching ...

A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches

Luca Ciampi, Ali Azmoudeh, Elif Ecem Akbaba, Erdi Sarıtaş, Ziya Ata Yazıcı, Hazım Kemal Ekenel, Giuseppe Amato, Fabrizio Falchi

TL;DR

This survey analyzes the emergence of class-agnostic counting (CAC), framing it as three paradigms: reference-based, reference-less, and open-world text-guided counting. It catalogues 29 CAC approaches, contrasts their reliance on exemplars, patterns, or textual prompts, and benchmarks them on FSC-147 and CARPK to reveal strengths and limitations. Key contributions include a taxonomy that clarifies methodological differences, a consolidated view of architectures and results, and a critical discussion of challenges such as annotation dependence and generalization, plus directions for future work including weak supervision and prompt-driven counting. The work underscores the progress toward open-vocabulary counting while candidly acknowledging gaps in data, evaluation, and robust language-driven understanding that future research must address. The survey thereby guides researchers toward more generalizable, data-efficient CAC solutions for diverse, real-world applications.

Abstract

Visual object counting has recently shifted towards class-agnostic counting (CAC), which addresses the challenge of counting objects across arbitrary categories -- a crucial capability for flexible and generalizable counting systems. Unlike humans, who effortlessly identify and count objects from diverse categories without prior knowledge, most existing counting methods are restricted to enumerating instances of known classes, requiring extensive labeled datasets for training and struggling in open-vocabulary settings. In contrast, CAC aims to count objects belonging to classes never seen during training, operating in a few-shot setting. In this paper, we present the first comprehensive review of CAC methodologies. We propose a taxonomy to categorize CAC approaches into three paradigms based on how target object classes can be specified: reference-based, reference-less, and open-world text-guided. Reference-based approaches achieve state-of-the-art performance by relying on exemplar-guided mechanisms. Reference-less methods eliminate exemplar dependency by leveraging inherent image patterns. Finally, open-world text-guided methods use vision-language models, enabling object class descriptions via textual prompts, offering a flexible and promising solution. Based on this taxonomy, we provide an overview of the architectures of 29 CAC approaches and report their results on gold-standard benchmarks. We compare their performance and discuss their strengths and limitations. Specifically, we present results on the FSC-147 dataset, setting a leaderboard using gold-standard metrics, and on the CARPK dataset to assess generalization capabilities. Finally, we offer a critical discussion of persistent challenges, such as annotation dependency and generalization, alongside future directions. We believe this survey will be a valuable resource, showcasing CAC advancements and guiding future research.

A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches

TL;DR

This survey analyzes the emergence of class-agnostic counting (CAC), framing it as three paradigms: reference-based, reference-less, and open-world text-guided counting. It catalogues 29 CAC approaches, contrasts their reliance on exemplars, patterns, or textual prompts, and benchmarks them on FSC-147 and CARPK to reveal strengths and limitations. Key contributions include a taxonomy that clarifies methodological differences, a consolidated view of architectures and results, and a critical discussion of challenges such as annotation dependence and generalization, plus directions for future work including weak supervision and prompt-driven counting. The work underscores the progress toward open-vocabulary counting while candidly acknowledging gaps in data, evaluation, and robust language-driven understanding that future research must address. The survey thereby guides researchers toward more generalizable, data-efficient CAC solutions for diverse, real-world applications.

Abstract

Visual object counting has recently shifted towards class-agnostic counting (CAC), which addresses the challenge of counting objects across arbitrary categories -- a crucial capability for flexible and generalizable counting systems. Unlike humans, who effortlessly identify and count objects from diverse categories without prior knowledge, most existing counting methods are restricted to enumerating instances of known classes, requiring extensive labeled datasets for training and struggling in open-vocabulary settings. In contrast, CAC aims to count objects belonging to classes never seen during training, operating in a few-shot setting. In this paper, we present the first comprehensive review of CAC methodologies. We propose a taxonomy to categorize CAC approaches into three paradigms based on how target object classes can be specified: reference-based, reference-less, and open-world text-guided. Reference-based approaches achieve state-of-the-art performance by relying on exemplar-guided mechanisms. Reference-less methods eliminate exemplar dependency by leveraging inherent image patterns. Finally, open-world text-guided methods use vision-language models, enabling object class descriptions via textual prompts, offering a flexible and promising solution. Based on this taxonomy, we provide an overview of the architectures of 29 CAC approaches and report their results on gold-standard benchmarks. We compare their performance and discuss their strengths and limitations. Specifically, we present results on the FSC-147 dataset, setting a leaderboard using gold-standard metrics, and on the CARPK dataset to assess generalization capabilities. Finally, we offer a critical discussion of persistent challenges, such as annotation dependency and generalization, alongside future directions. We believe this survey will be a valuable resource, showcasing CAC advancements and guiding future research.

Paper Structure

This paper contains 16 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Comparison between class-specific and class-agnostic counting. On the left, we report two examples of class-specific counters, which need individually trained networks on extensive labeled datasets for each object type. On the right, we show the new class-agnostic counting setting, where the counter estimates the number of objects belonging to arbitrary classes. Note that, in the latter scenario, training and test object classes are different.
  • Figure 2: Overview of class-agnostic counting paradigms. We propose a taxonomy to classify existing CAC methodologies: a) Reference-based approaches rely on annotated bounding box exemplars, which serve as visual prototypes for the object classes to be counted; b) Reference-less techniques relax these requirements, enabling models to automatically identify the dominant class(es) to be counted; and c) Open-world Text-guided methodologies allow the use of textual descriptions to specify the object classes to be considered.
  • Figure 3: Overview of selected CAC methodologies over time. We highlight key milestone methods along a temporal line. Colors represent their category based on our taxonomy: red for reference-based, green for reference-less, and blue for open-world text-guided CAC approaches.
  • Figure 4: Train and inference high-level overview of reference-based CAC approaches relying on density regression. Following the notation of this survey, CAC models are fed with query images $I_i$ and $K$ exemplars expressed as bounding box coordinates $B^E = \{b_i\}_{i=1:k} \in \mathbb{R}^{4}$. Usually, $K = 3$. Exemplars belong to several object classes, which are different at training and test time. The model is in charge of (i) computing feature representations from these inputs that should be agnostic from specific object classes, and (ii) predicting density maps $D^{map}$ from these feature representations. Density-based methods are the standard approach for object counting in crowded scenes. Training typically involves minimizing a per-pixel loss between the predicted and ground-truth density maps, with the final count obtained by summing the pixel values of the predicted density maps.
  • Figure 5: Dataset samples. We report some examples of the CAC gold standard benchmarks. a) The FSC-147 dataset DBLP:conf/cvpr/RanjanSNH21 is the standard for class-agnostic counting. It includes more than 6,000 images belonging to 147 object classes. We show some sample images together with the provided labels -- bounding boxes localizing the exemplars and density maps. b) The CARPK dataset DBLP:conf/iccv/HsiehLH17 is a set of drone-view images tailored for vehicle counting that is often exploited for assessing the generalization capabilities of CAC models. We show some sample images together with bounding boxes marking the exemplars.