Table of Contents
Fetching ...

DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukežič, Vitjan Zavrtanik, Matej Kristan

TL;DR

The paper tackles low-shot counting by addressing the gap between density-based total-count estimation and object-level outputs. It introduces DAVE, a detect-and-verify pipeline that first generates a high-recall set of detections from a density-guided proposal and then verifies and prunes outliers to refine both localization and the density map. Across extensive experiments on FSC147 and FSCD147, DAVE achieves state-of-the-art performance in few-shot, zero-shot, and prompt-based counting, while providing bounding boxes and a corrected density map for downstream tasks. The approach offers a practical, unified framework that combines the strengths of density-based and detection-based counting and demonstrates robust performance across diverse settings.

Abstract

Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting.

DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting

TL;DR

The paper tackles low-shot counting by addressing the gap between density-based total-count estimation and object-level outputs. It introduces DAVE, a detect-and-verify pipeline that first generates a high-recall set of detections from a density-guided proposal and then verifies and prunes outliers to refine both localization and the density map. Across extensive experiments on FSC147 and FSCD147, DAVE achieves state-of-the-art performance in few-shot, zero-shot, and prompt-based counting, while providing bounding boxes and a corrected density map for downstream tasks. The approach offers a practical, unified framework that combines the strengths of density-based and detection-based counting and demonstrates robust performance across diverse settings.

Abstract

Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting.
Paper Structure (13 sections, 1 equation, 5 figures, 10 tables)

This paper contains 13 sections, 1 equation, 5 figures, 10 tables.

Figures (5)

  • Figure 1: Despite considering exemplars (yellow boxes), the state-of-the-art (e.g., CounTR Liu_2022_BMVC) is prone to false activations on incorrect objects, leading to corrupted counts. DAVE avoids this issue by detecting all candidates (red and green boxes), verifying them, removing the outliers (red boxes), and correcting the final density map, thus jointly improving detection and count estimation.
  • Figure 2: The proposed DAVE architecture consists of two stages, (i) detection and (ii) verification, and outputs detected objects as well as an improved location density map. NMS denotes non-maxima suppression, FFM is a feature fusion module, $\Omega$ is a bounding box regression head and $\phi$ is the verification feature extraction network.
  • Figure 3: Qualitative comparison of DAVE with LOCA djukic_loca, SAFECount you2023few and CounTR Liu_2022_BMVC. The first two columns show the input images and the ground truth (GT), while the predicted densities are shown in the rest.
  • Figure 4: DAVE localization performance in challenging situations compared with the current best method C-DETR counting-detr.
  • Figure 5: DAVE density-based and box-count accuracy with respect to the number of objects in the image.