EHNet: An Efficient Hybrid Network for Crowd Counting and Localization
Yuqing Yan, Yirui Wu
TL;DR
EHNet addresses crowd counting and localization in scenes with multi-scale distributions by reframing counting as a point regression problem that predicts $M$ points with coordinates $(\hat{x}_j, \hat{y}_j)$ (and associated confidence) from anchor points. It introduces SPAM (Spatial-Position Attention Module) for global context, AFAM (Adaptive Feature Aggregation Module) for multi-scale feature fusion, and MSAD (Multi-Scale Attentive Decoder) to decode features into offsets and confidences with linear complexity for global dependencies. Evaluations on four datasets ShanghaiTech Part $A$, ShanghaiTech Part $B$, UCF CC $50$, and UCF-QNRF show competitive MAE/MSE and strong crowd localization (Hungarian matching) with reduced parameters and efficient inference. Ablation studies confirm that SPAM, AFAM, and MSAD each contribute to performance, with the full EHNet achieving MAE $50.29$ and MSE $81.01$ on ShanghaiTech Part $A$, illustrating practical efficacy for real-time crowd analytics.
Abstract
In recent years, crowd counting and localization have become crucial techniques in computer vision, with applications spanning various domains. The presence of multi-scale crowd distributions within a single image remains a fundamental challenge in crowd counting tasks. To address these challenges, we introduce the Efficient Hybrid Network (EHNet), a novel framework for efficient crowd counting and localization. By reformulating crowd counting into a point regression framework, EHNet leverages the Spatial-Position Attention Module (SPAM) to capture comprehensive spatial contexts and long-range dependencies. Additionally, we develop an Adaptive Feature Aggregation Module (AFAM) to effectively fuse and harmonize multi-scale feature representations. Building upon these, we introduce the Multi-Scale Attentive Decoder (MSAD). Experimental results on four benchmark datasets demonstrate that EHNet achieves competitive performance with reduced computational overhead, outperforming existing methods on ShanghaiTech Part \_A, ShanghaiTech Part \_B, UCF-CC-50, and UCF-QNRF. Our code is in https://anonymous.4open.science/r/EHNet.
