Table of Contents
Fetching ...

A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting

Tsung-Han Chou, Brian Wang, Wei-Chen Chiu, Jun-Cheng Chen

TL;DR

It is discovered that the combination of mosaic augmentation with generalized loss is essential for addressing the aforementioned issue of CAC models to count objects of majority regardless of the references.

Abstract

Class agnostic counting (CAC) is a vision task that can be used to count the total occurrence number of any given reference objects in the query image. The task is usually formulated as a density map estimation problem through similarity computation among a few image samples of the reference object and the query image. In this paper, we point out a severe issue of the existing CAC framework: Given a multi-class setting, models don't consider reference images and instead blindly match all dominant objects in the query image. Moreover, the current evaluation metrics and dataset cannot be used to faithfully assess the model's generalization performance and robustness. To this end, we discover that the combination of mosaic augmentation with generalized loss is essential for addressing the aforementioned issue of CAC models to count objects of majority (i.e. dominant objects) regardless of the references. Furthermore, we introduce a new evaluation protocol and metrics for resolving the problem behind the existing CAC evaluation scheme and better benchmarking CAC models in a more fair manner. Besides, extensive evaluation results demonstrate that our proposed recipe can consistently improve the performance of different CAC models. The code is available at https://github.com/littlepenguin89106/MGCAC.

A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting

TL;DR

It is discovered that the combination of mosaic augmentation with generalized loss is essential for addressing the aforementioned issue of CAC models to count objects of majority regardless of the references.

Abstract

Class agnostic counting (CAC) is a vision task that can be used to count the total occurrence number of any given reference objects in the query image. The task is usually formulated as a density map estimation problem through similarity computation among a few image samples of the reference object and the query image. In this paper, we point out a severe issue of the existing CAC framework: Given a multi-class setting, models don't consider reference images and instead blindly match all dominant objects in the query image. Moreover, the current evaluation metrics and dataset cannot be used to faithfully assess the model's generalization performance and robustness. To this end, we discover that the combination of mosaic augmentation with generalized loss is essential for addressing the aforementioned issue of CAC models to count objects of majority (i.e. dominant objects) regardless of the references. Furthermore, we introduce a new evaluation protocol and metrics for resolving the problem behind the existing CAC evaluation scheme and better benchmarking CAC models in a more fair manner. Besides, extensive evaluation results demonstrate that our proposed recipe can consistently improve the performance of different CAC models. The code is available at https://github.com/littlepenguin89106/MGCAC.
Paper Structure (21 sections, 3 equations, 7 figures, 4 tables)

This paper contains 21 sections, 3 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparison of CAC Models with Different Training Strategies in Real-World Images. Given the same query image with different references (annotated by yellow bounding boxes), (a) the state-of-the-art CAC models (e.g., BMNet shi2022represent, LOCA djukic2022low) adopting the standard training setting upon the FSC-147 dataset fail to distinguish the appearance of objects and gives wrong counts (i.e. the model does not fire at the objects of the same class as references on the query image, but the objects of majority. Noting that the dilation is applied on the density maps for best viewing). Conversely, (b) our suggested recipe, which utilizes novel technique of integrating Mosaic Augmentation (MA) and Generalized Loss (GL), enables the models to effectively discriminate different objects, thereby resulting in significantly better count prediction.
  • Figure 2: More qualitative results of various CAC models upon real images. The existing state-of-the-art CAC models such as (b) LOCA djukic2022low and (c) CountTR liu2022countr have problems on either counting all objects in the query image (and completely disregarding the target objects showcased in reference images) or unintentionally detecting irrelevant objects. In contrast, our proposed MGCAC model, featuring the multi-class training scenario and the more feasible objective to CAC task, faithfully follows the guidance from the references to achieve superior results of counting.
  • Figure 3: CAC Framework. The general pipeline of CAC models consists of a feature extractor, matcher, and density head. Given a query image and $K$ reference images (i.e., $K$ = 3 in this diagram), the model learns to predict a density map for counting.
  • Figure 4: Distribution of object count in FSC-147 test dataset. We split the test dataset into 10 bins. The x-axis is the count range of the corresponding bin, y-axis is the number of queries in log-scale.
  • Figure 5: Mosaic evaluation. Given different references of the target class, the objective is to count the corresponding objects in the query.
  • ...and 2 more figures