Table of Contents
Fetching ...

Learning Spatial Similarity Distribution for Few-shot Object Counting

Yuanwu Xu, Feifan Song, Haofeng Zhang

TL;DR

This work addresses the problem of counting objects of unseen classes from few exemplars by introducing Spatial Similarity Distribution (SSD), which preserves the spatial structure of exemplar features and represents similarity as a 4D pyramid. A center-pivot 4D-based Similarity Learning Module (SLM) fuses multi-level similarity tensors, while a Feature Cross Enhancement (FCE) module mutualistically tightens query-exemplar feature interactions to improve matching. The model is trained with a generalized entropic-regularized unbalanced OT loss and augmented by dynamic image scaling to handle dense scenes, achieving state-of-the-art results on FSC-147 and CARPK. Overall, SSD demonstrates that leveraging rich spatial distribution information in similarity matching significantly improves few-shot counting accuracy and generalization across datasets.

Abstract

Few-shot object counting aims to count the number of objects in a query image that belong to the same class as the given exemplar images. Existing methods compute the similarity between the query image and exemplars in the 2D spatial domain and perform regression to obtain the counting number. However, these methods overlook the rich information about the spatial distribution of similarity on the exemplar images, leading to significant impact on matching accuracy. To address this issue, we propose a network learning Spatial Similarity Distribution (SSD) for few-shot object counting, which preserves the spatial structure of exemplar features and calculates a 4D similarity pyramid point-to-point between the query features and exemplar features, capturing the complete distribution information for each point in the 4D similarity space. We propose a Similarity Learning Module (SLM) which applies the efficient center-pivot 4D convolutions on the similarity pyramid to map different similarity distributions to distinct predicted density values, thereby obtaining accurate count. Furthermore, we also introduce a Feature Cross Enhancement (FCE) module that enhances query and exemplar features mutually to improve the accuracy of feature matching. Our approach outperforms state-of-the-art methods on multiple datasets, including FSC-147 and CARPK. Code is available at https://github.com/CBalance/SSD.

Learning Spatial Similarity Distribution for Few-shot Object Counting

TL;DR

This work addresses the problem of counting objects of unseen classes from few exemplars by introducing Spatial Similarity Distribution (SSD), which preserves the spatial structure of exemplar features and represents similarity as a 4D pyramid. A center-pivot 4D-based Similarity Learning Module (SLM) fuses multi-level similarity tensors, while a Feature Cross Enhancement (FCE) module mutualistically tightens query-exemplar feature interactions to improve matching. The model is trained with a generalized entropic-regularized unbalanced OT loss and augmented by dynamic image scaling to handle dense scenes, achieving state-of-the-art results on FSC-147 and CARPK. Overall, SSD demonstrates that leveraging rich spatial distribution information in similarity matching significantly improves few-shot counting accuracy and generalization across datasets.

Abstract

Few-shot object counting aims to count the number of objects in a query image that belong to the same class as the given exemplar images. Existing methods compute the similarity between the query image and exemplars in the 2D spatial domain and perform regression to obtain the counting number. However, these methods overlook the rich information about the spatial distribution of similarity on the exemplar images, leading to significant impact on matching accuracy. To address this issue, we propose a network learning Spatial Similarity Distribution (SSD) for few-shot object counting, which preserves the spatial structure of exemplar features and calculates a 4D similarity pyramid point-to-point between the query features and exemplar features, capturing the complete distribution information for each point in the 4D similarity space. We propose a Similarity Learning Module (SLM) which applies the efficient center-pivot 4D convolutions on the similarity pyramid to map different similarity distributions to distinct predicted density values, thereby obtaining accurate count. Furthermore, we also introduce a Feature Cross Enhancement (FCE) module that enhances query and exemplar features mutually to improve the accuracy of feature matching. Our approach outperforms state-of-the-art methods on multiple datasets, including FSC-147 and CARPK. Code is available at https://github.com/CBalance/SSD.
Paper Structure (20 sections, 10 equations, 6 figures, 3 tables)

This paper contains 20 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Comparison between existing methods and our method. Compared to the feature similarity computation process in previous methods, our approach preserves the spatial structure of exemplars. Each position is computed with query features, and in the subsequent convolutional regression process, we fully utilize the spatial similarity distribution information between query and exemplar features at a point-to-point level.
  • Figure 2: Heatmap depicting the similarity distribution of objects at different positions on the exemplar.
  • Figure 3: The whole architecture of the proposed SSD framework.
  • Figure 4: Concatenation of multi-level similarity matrices.
  • Figure 5: Qualitative results on the FSC-147 dataset.
  • ...and 1 more figures