Table of Contents
Fetching ...

A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation

Jer Pelhan, Alan Lukežič, Vitjan Zavrtanik, Matej Kristan

TL;DR

GeCo tackles low-shot counting by providing a single-stage architecture that integrates object detection, segmentation, and count estimation. It introduces a dense object query formulation that generalizes prototypes across the entire image and yields dense predictions, paired with a task-aligned dense detection loss that directly optimizes detection quality. Empirically, GeCo surpasses both density-based and detection-based baselines, achieving state-of-the-art results in few-shot, one-shot, and zero-shot settings and delivering reliable segmentation masks for detected objects. This approach enables accurate counting with explainable localization, potentially broadening the applicability of low-shot counting in complex, densely populated scenes.

Abstract

Low-shot object counters estimate the number of objects in an image using few or no annotated exemplars. Objects are localized by matching them to prototypes, which are constructed by unsupervised image-wide object appearance aggregation. Due to potentially diverse object appearances, the existing approaches often lead to overgeneralization and false positive detections. Furthermore, the best-performing methods train object localization by a surrogate loss, that predicts a unit Gaussian at each object center. This loss is sensitive to annotation error, hyperparameters and does not directly optimize the detection task, leading to suboptimal counts. We introduce GeCo, a novel low-shot counter that achieves accurate object detection, segmentation, and count estimation in a unified architecture. GeCo robustly generalizes the prototypes across objects appearances through a novel dense object query formulation. In addition, a novel counting loss is proposed, that directly optimizes the detection task and avoids the issues of the standard surrogate loss. GeCo surpasses the leading few-shot detection-based counters by $\sim$25\% in the total count MAE, achieves superior detection accuracy and sets a new solid state-of-the-art result across all low-shot counting setups.

A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation

TL;DR

GeCo tackles low-shot counting by providing a single-stage architecture that integrates object detection, segmentation, and count estimation. It introduces a dense object query formulation that generalizes prototypes across the entire image and yields dense predictions, paired with a task-aligned dense detection loss that directly optimizes detection quality. Empirically, GeCo surpasses both density-based and detection-based baselines, achieving state-of-the-art results in few-shot, one-shot, and zero-shot settings and delivering reliable segmentation masks for detected objects. This approach enables accurate counting with explainable localization, potentially broadening the applicability of low-shot counting in complex, densely populated scenes.

Abstract

Low-shot object counters estimate the number of objects in an image using few or no annotated exemplars. Objects are localized by matching them to prototypes, which are constructed by unsupervised image-wide object appearance aggregation. Due to potentially diverse object appearances, the existing approaches often lead to overgeneralization and false positive detections. Furthermore, the best-performing methods train object localization by a surrogate loss, that predicts a unit Gaussian at each object center. This loss is sensitive to annotation error, hyperparameters and does not directly optimize the detection task, leading to suboptimal counts. We introduce GeCo, a novel low-shot counter that achieves accurate object detection, segmentation, and count estimation in a unified architecture. GeCo robustly generalizes the prototypes across objects appearances through a novel dense object query formulation. In addition, a novel counting loss is proposed, that directly optimizes the detection task and avoids the issues of the standard surrogate loss. GeCo surpasses the leading few-shot detection-based counters by 25\% in the total count MAE, achieves superior detection accuracy and sets a new solid state-of-the-art result across all low-shot counting setups.
Paper Structure (12 sections, 3 equations, 8 figures, 8 tables)

This paper contains 12 sections, 3 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: DAVE dave predicts object centers (red dots) biased towards blob-like structures, leading to incorrect partial detections of ants (bottom left), while GeCo(ours) addresses this with the new loss (top left). CDETR counting-detr fails in densely populated regions (bottom right), while GeCo addresses this with the new dense query formulation by prototype generalization (top right). Exploiting the SAM backbone, GeCo delivers segmentations as well. Exemplars are denoted in blue.
  • Figure 2: The architecture of the proposed single-stage low-shot counter GeCo.
  • Figure 3: Compared with state-of-the-art few-shot detection-based counters DAVE dave, PSECO pseco, and C-DETR counting-detr, GeCo delivers more accurate detections with less false positives and better global counts. Exemplars are delineated with blue color, while segmentations are not shown for clarity.
  • Figure 4: Response maps (in yellow), and locations for bounding box predictions (red dots) when using the proposed (first row) and the standard davedjukic_locapseco (second row) training loss.
  • Figure 5: Comparison of few-shot counting on FSCD147. Exemplars are shown with red color and ERR indicates count error.
  • ...and 3 more figures