Embodied Crowd Counting
Runling Long, Yunlong Wang, Jia Wan, Xiang Deng, Xinting Zhu, Weili Guan, Antoni B. Chan, Liqiang Nie
TL;DR
This work introduces Embodied Crowd Counting (ECC) to address occlusion in crowd counting by leveraging drone-based, interactive sensing in large outdoor environments. It provides the Embodied Crowd Counting Dataset (ECCD) to enable large-scale, interactive crowd analysis and proposes ZECC, a zero-shot baseline with three modules—Active Top-down Exploration (ATE), Normal-line based Navigation (NLBN), and Fine Detection and Counting (FDC)—to achieve accurate counting with efficient exploration. ZECC demonstrates favorable trading-off between counting error ($MAPE$) and navigation distance ($TD$) against competitive baselines, and extensive ablations confirm the necessity of each component, with real-world demonstrations verifying robustness to occlusion. The work establishes a new benchmark and methodology for scalable, interactive crowd analysis with potential applications in public safety and urban planning, while acknowledging simulation-based limitations and future work on dynamic targets and real-world deployment.
Abstract
Occlusion is one of the fundamental challenges in crowd counting. In the community, various data-driven approaches have been developed to address this issue, yet their effectiveness is limited. This is mainly because most existing crowd counting datasets on which the methods are trained are based on passive cameras, restricting their ability to fully sense the environment. Recently, embodied navigation methods have shown significant potential in precise object detection in interactive scenes. These methods incorporate active camera settings, holding promise in addressing the fundamental issues in crowd counting. However, most existing methods are designed for indoor navigation, showing unknown performance in analyzing complex object distribution in large scale scenes, such as crowds. Besides, most existing embodied navigation datasets are indoor scenes with limited scale and object quantity, preventing them from being introduced into dense crowd analysis. Based on this, a novel task, Embodied Crowd Counting (ECC), is proposed. We first build up an interactive simulator, Embodied Crowd Counting Dataset (ECCD), which enables large scale scenes and large object quantity. A prior probability distribution that approximates realistic crowd distribution is introduced to generate crowds. Then, a zero-shot navigation method (ZECC) is proposed. This method contains a MLLM driven coarse-to-fine navigation mechanism, enabling active Z-axis exploration, and a normal-line-based crowd distribution analysis method for fine counting. Experimental results against baselines show that the proposed method achieves the best trade-off between counting accuracy and navigation cost.
