Generalized-Scale Object Counting with Gradual Query Aggregation
Jer Pelhan, Alan Lukezic, Matej Kristan
TL;DR
This paper tackles few-shot counting in imagery with large scale variation and dense object regions by introducing GECO2, a detection-based counter that learns scale-generalized dense queries through scale-specific encoders and cross-scale aggregation.GECO2 constructs high-resolution dense object queries at multiple scales, enabling accurate localization and counting without resorting to ad-hoc input upsampling or image tiling, and integrates SAM2 segmentation for refined masks.Through scale-aware prototypes, deformable attention, and lightweight fusion, GECO2 achieves superior counting and detection performance across FSCD147, FSCD-LVIS, and MCAC benchmarks while reducing memory usage and speeding up inference by about 3× relative to prior state-of-the-art methods.A comprehensive set of ablations demonstrates the contributions of multi-scale query construction, auxiliary supervision for small objects, and the efficiency of the cross-scale aggregation approach, underscoring its practical impact for real-world, densely populated scenes.
Abstract
Few-shot detection-based counters estimate the number of instances in the image specified only by a few test-time exemplars. A common approach to localize objects across multiple sizes is to merge backbone features of different resolutions. Furthermore, to enable small object detection in densely populated regions, the input image is commonly upsampled and tiling is applied to cope with the increased computational and memory requirements. Because of these ad-hoc solutions, existing counters struggle with images containing diverse-sized objects and densely populated regions of small objects. We propose GECO2, an end-to-end few-shot counting and detection method that explicitly addresses the object scale issues. A new dense query representation gradually aggregates exemplar-specific feature information across scales that leads to high-resolution dense queries that enable detection of large as well as small objects. GECO2 surpasses state-of-the-art few-shot counters in counting as well as detection accuracy by 10% while running 3x times faster at smaller GPU memory footprint.
