Table of Contents
Fetching ...

L2HCount:Generalizing Crowd Counting from Low to High Crowd Density via Density Simulation

Guoliang Xu, Jianqin Yin, Ren Zhang, Yonghao Dang, Feng Zhou, Bo Yu

TL;DR

This paper tackles the challenge of generalizing crowd counting from low- to high-density scenes by introducing L2HCount, a framework that synthesizes high-density images from low-density ones using a High-Density Simulation Module and automatically generates corresponding ground-truth annotations via GTGM. It further refines the simulated data with a Head Feature Enhancement Module and learns both density regimes through a Dual-Density Memory Encoding Module that leverages separate Low-Density and High-Density memory banks. The method, validated on four popular datasets, consistently outperforms fully supervised, domain adaptation, and domain generalization baselines in low-to-high-density transfer tasks, demonstrating strong density-gap generalization without target-domain annotation. Collectively, L2HCount offers a practical pathway to robust crowd counting across varying densities, reducing labeling burden while improving counting accuracy in real-world surveillance scenarios.

Abstract

Since COVID-19, crowd-counting tasks have gained wide applications. While supervised methods are reliable, annotation is more challenging in high-density scenes due to small head sizes and severe occlusion, whereas it's simpler in low-density scenes. Interestingly, can we train the model in low-density scenes and generalize it to high-density scenes? Therefore, we propose a low- to high-density generalization framework (L2HCount) that learns the pattern related to high-density scenes from low-density ones, enabling it to generalize well to high-density scenes. Specifically, we first introduce a High-Density Simulation Module and a Ground-Truth Generation Module to construct fake high-density images along with their corresponding ground-truth crowd annotations respectively by image-shifting technique, effectively simulating high-density crowd patterns. However, the simulated images have two issues: image blurring and loss of low-density image characteristics. Therefore, we second propose a Head Feature Enhancement Module to extract clear features in the simulated high-density scene. Third, we propose a Dual-Density Memory Encoding Module that uses two crowd memories to learn scene-specific patterns from low- and simulated high-density scenes, respectively. Extensive experiments on four challenging datasets have shown the promising performance of L2HCount.

L2HCount:Generalizing Crowd Counting from Low to High Crowd Density via Density Simulation

TL;DR

This paper tackles the challenge of generalizing crowd counting from low- to high-density scenes by introducing L2HCount, a framework that synthesizes high-density images from low-density ones using a High-Density Simulation Module and automatically generates corresponding ground-truth annotations via GTGM. It further refines the simulated data with a Head Feature Enhancement Module and learns both density regimes through a Dual-Density Memory Encoding Module that leverages separate Low-Density and High-Density memory banks. The method, validated on four popular datasets, consistently outperforms fully supervised, domain adaptation, and domain generalization baselines in low-to-high-density transfer tasks, demonstrating strong density-gap generalization without target-domain annotation. Collectively, L2HCount offers a practical pathway to robust crowd counting across varying densities, reducing labeling burden while improving counting accuracy in real-world surveillance scenarios.

Abstract

Since COVID-19, crowd-counting tasks have gained wide applications. While supervised methods are reliable, annotation is more challenging in high-density scenes due to small head sizes and severe occlusion, whereas it's simpler in low-density scenes. Interestingly, can we train the model in low-density scenes and generalize it to high-density scenes? Therefore, we propose a low- to high-density generalization framework (L2HCount) that learns the pattern related to high-density scenes from low-density ones, enabling it to generalize well to high-density scenes. Specifically, we first introduce a High-Density Simulation Module and a Ground-Truth Generation Module to construct fake high-density images along with their corresponding ground-truth crowd annotations respectively by image-shifting technique, effectively simulating high-density crowd patterns. However, the simulated images have two issues: image blurring and loss of low-density image characteristics. Therefore, we second propose a Head Feature Enhancement Module to extract clear features in the simulated high-density scene. Third, we propose a Dual-Density Memory Encoding Module that uses two crowd memories to learn scene-specific patterns from low- and simulated high-density scenes, respectively. Extensive experiments on four challenging datasets have shown the promising performance of L2HCount.

Paper Structure

This paper contains 17 sections, 8 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: (a) Example of image and ground-truth in the low-density scene; (b) Example of image and ground-truth in the high-density scene. GT represents the ground-truth of crowd counting in the image.
  • Figure 2: The proposed L2HCount is used to realize the crowd counting generalization from the low- to high-density scene. The L2HCount mainly contains high- and low-density branches, the High-Density Simulation Module (HDSM), the Head Feature Enhancement Module (HFEM), the Dual-Density Memory Encoding Module (DDMEM), and the Ground-Truth Generation Module (GTGM). $GT^{ori}$ and $GT^{sim}$ represent the ground-truth of low- and high-density images. It is worth noting that we only use the high-density branch and DDMEM for inference, as shown in blue arrows.
  • Figure 3: Some samples of images from ShanghaiTech Part A$\&$B, RGBT-CC, and UCF-QNRF datasets.
  • Figure 4: Visualization of the predicted results. We show the predicted results of MPCount and L2HCount on the A and Q datasets. L2HCount predicts more reasonable results in the rectangle regions.
  • Figure 5: Some samples of low- and simulated high-density images on the B dataset.
  • ...and 4 more figures