Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Ming-Hsuan Yang, Sy-Yen Kuo
TL;DR
This work tackles instability in the learning signal of point-based crowd counting and localization caused by unstable proposal-target matching. It introduces Auxiliary Point Guidance (APG) to provide explicit positive/negative guidance and Implicit Feature Interpolation (IFI) to enable accurate feature extraction at arbitrary positions, forming the APGCC framework. By integrating APG and IFI into a VGG-16/ASPP backbone and a joint objective that combines the standard point-based loss with APG losses, the approach achieves state-of-the-art results across multiple counting and localization benchmarks, including SHHA, SHHB, UCF-QNRF, JHU-Crowd, UCF_CC_50, and NWPU-Crowd. Ablation studies confirm that APG and IFI contribute independently and synergistically, while maintaining efficient inference; the authors also emphasize practical impact through improved robustness across densities and environments and plan public release of code and models.
Abstract
Crowd counting and localization have become increasingly important in computer vision due to their wide-ranging applications. While point-based strategies have been widely used in crowd counting methods, they face a significant challenge, i.e., the lack of an effective learning strategy to guide the matching process. This deficiency leads to instability in matching point proposals to target points, adversely affecting overall performance. To address this issue, we introduce an effective approach to stabilize the proposal-target matching in point-based methods. We propose Auxiliary Point Guidance (APG) to provide clear and effective guidance for proposal selection and optimization, addressing the core issue of matching uncertainty. Additionally, we develop Implicit Feature Interpolation (IFI) to enable adaptive feature extraction in diverse crowd scenarios, further enhancing the model's robustness and accuracy. Extensive experiments demonstrate the effectiveness of our approach, showing significant improvements in crowd counting and localization performance, particularly under challenging conditions. The source codes and trained models will be made publicly available.
