Table of Contents
Fetching ...

FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting

Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Xian Zhong, Shengfeng He

TL;DR

This work tackles class-Count imbalance in class-agnostic counting by introducing FocalCount, which estimates the image's number of categories from feature attributes such as entropy $E_i$, offset $O_i$, and certainty $C_i$, and uses this estimate as a weight to rectify category imbalance. It couples this with Focal-MSE, an error-sensitive loss that emphasizes underrepresented categories, and a dual-phase curriculum that transitions from Focal-MSE to standard MSE to refine density-map supervision. The approach also uses a Dirichlet mixture to robustly combine the attribute-based signals into a unified supervision weight $U_\mathcal{C}$, and it integrates a dual-phase loss $\mathcal{L}^D$ with the overall objective $\mathcal{L}_{\mathrm{all}} = \frac{1}{n} \sum_i U_\mathcal{C} \mathcal{L}^D_i(M^p_i, M^g_i)$. Across FSC-147, CARPK, and ShanghaiTech, FocalCount achieves state-of-the-art or highly competitive results in few-shot and zero-shot settings, with strong transferability and robust density-map differentiation between specified and non-specified categories.

Abstract

In class-agnostic object counting, the goal is to estimate the total number of object instances in an image without distinguishing between specific categories. Existing methods often predict this count without considering class-specific outputs, leading to inaccuracies when such outputs are required. These inaccuracies stem from two key challenges: 1) the prevalence of single-category images in datasets, which leads models to generalize specific categories as representative of all objects, and 2) the use of mean squared error loss during training, which applies uniform penalization. This uniform penalty disregards errors in less frequent categories, particularly when these errors contribute minimally to the overall loss. To address these issues, we propose {FocalCount}, a novel approach that leverages diverse feature attributes to estimate the number of object categories in an image. This estimate serves as a weighted factor to correct class-count imbalances. Additionally, we introduce {Focal-MSE}, a new loss function that integrates binary cross-entropy to generate stronger error gradients, enhancing the model's sensitivity to errors in underrepresented categories. Our approach significantly improves the model's ability to distinguish between specific classes and general counts, demonstrating superior performance and scalability in both few-shot and zero-shot scenarios across three object counting datasets. The code will be released soon.

FocalCount: Towards Class-Count Imbalance in Class-Agnostic Counting

TL;DR

This work tackles class-Count imbalance in class-agnostic counting by introducing FocalCount, which estimates the image's number of categories from feature attributes such as entropy , offset , and certainty , and uses this estimate as a weight to rectify category imbalance. It couples this with Focal-MSE, an error-sensitive loss that emphasizes underrepresented categories, and a dual-phase curriculum that transitions from Focal-MSE to standard MSE to refine density-map supervision. The approach also uses a Dirichlet mixture to robustly combine the attribute-based signals into a unified supervision weight , and it integrates a dual-phase loss with the overall objective . Across FSC-147, CARPK, and ShanghaiTech, FocalCount achieves state-of-the-art or highly competitive results in few-shot and zero-shot settings, with strong transferability and robust density-map differentiation between specified and non-specified categories.

Abstract

In class-agnostic object counting, the goal is to estimate the total number of object instances in an image without distinguishing between specific categories. Existing methods often predict this count without considering class-specific outputs, leading to inaccuracies when such outputs are required. These inaccuracies stem from two key challenges: 1) the prevalence of single-category images in datasets, which leads models to generalize specific categories as representative of all objects, and 2) the use of mean squared error loss during training, which applies uniform penalization. This uniform penalty disregards errors in less frequent categories, particularly when these errors contribute minimally to the overall loss. To address these issues, we propose {FocalCount}, a novel approach that leverages diverse feature attributes to estimate the number of object categories in an image. This estimate serves as a weighted factor to correct class-count imbalances. Additionally, we introduce {Focal-MSE}, a new loss function that integrates binary cross-entropy to generate stronger error gradients, enhancing the model's sensitivity to errors in underrepresented categories. Our approach significantly improves the model's ability to distinguish between specific classes and general counts, demonstrating superior performance and scalability in both few-shot and zero-shot scenarios across three object counting datasets. The code will be released soon.

Paper Structure

This paper contains 37 sections, 17 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Illustration of Object Counting. (a) Miscounting of unspecified classes by CounTR (few-shot visual prompt) and CLIP-Count (zero-shot textual prompt). (b) Object count distribution showing single-category dominance in FSC-147. (c) Impact of category diversity on counting accuracy using feature attributes. (d) Gradient comparison between Focal-MSE and MSE.
  • Figure 2: Illustration of Feature Attributes in Single-Category versus Multi-Category Images.
  • Figure 3: Overview of the FocalCount Method. (1) Mitigating data imbalance by estimating the Focal Category Count Prior using entropy, offset, and feature certainty. (2) Employing dual-phase curriculum learning: initially applying Focal-MSE to enhance quantity supervision and sensitivity to errors in unspecified categories, followed by MSE for precise learning.
  • Figure 4: Relation between Number of Categories and Attributes in Pascal VOCeveringham2010pascal. As the number of categories increases, entropy and offset rise, while certainty declines, indicating greater complexity.
  • Figure 5: Density Maps and Transfer Results in Zero-Shot and Few-Shot Settings. (a) Density maps for different components in the zero-shot setting, with yellow boxes highlighting error-prone regions. (b) Comparison of density maps in the few-shot setting. (c) Transfer results from FSC-147 to ShanghaiTech and CARPK.
  • ...and 4 more figures