Table of Contents
Fetching ...

SynMVCrowd: A Large Synthetic Benchmark for Multi-view Crowd Counting and Localization

Qi Zhang, Daijie Chen, Yunfei Gong, Hui Huang

Abstract

Existing multi-view crowd counting and localization methods are evaluated under relatively small scenes with limited crowd numbers, camera views, and frames. This makes the evaluation and comparison of existing methods impractical, as small datasets are easily overfit by these methods. To avoid these issues, 3DROM proposes a data augmentation method. Instead, in this paper, we propose a large synthetic benchmark, SynMVCrowd, for more practical evaluation and comparison of multi-view crowd counting and localization tasks. The SynMVCrowd benchmark consists of 50 synthetic scenes with a large number of multi-view frames and camera views and a much larger crowd number (up to 1000), which is more suitable for large-scene multi-view crowd vision tasks. Besides, we propose strong multi-view crowd localization and counting baselines that outperform all comparison methods on the new SynMVCrowd benchmark. Moreover, we prove that better domain transferring multi-view and single-image counting performance could be achieved with the aid of the benchmark on novel new real scenes. As a result, the proposed benchmark could advance the research for multi-view and single-image crowd counting and localization to more practical applications. The codes and datasets are here: https://github.com/zqyq/SynMVCrowd.

SynMVCrowd: A Large Synthetic Benchmark for Multi-view Crowd Counting and Localization

Abstract

Existing multi-view crowd counting and localization methods are evaluated under relatively small scenes with limited crowd numbers, camera views, and frames. This makes the evaluation and comparison of existing methods impractical, as small datasets are easily overfit by these methods. To avoid these issues, 3DROM proposes a data augmentation method. Instead, in this paper, we propose a large synthetic benchmark, SynMVCrowd, for more practical evaluation and comparison of multi-view crowd counting and localization tasks. The SynMVCrowd benchmark consists of 50 synthetic scenes with a large number of multi-view frames and camera views and a much larger crowd number (up to 1000), which is more suitable for large-scene multi-view crowd vision tasks. Besides, we propose strong multi-view crowd localization and counting baselines that outperform all comparison methods on the new SynMVCrowd benchmark. Moreover, we prove that better domain transferring multi-view and single-image counting performance could be achieved with the aid of the benchmark on novel new real scenes. As a result, the proposed benchmark could advance the research for multi-view and single-image crowd counting and localization to more practical applications. The codes and datasets are here: https://github.com/zqyq/SynMVCrowd.
Paper Structure (28 sections, 2 equations, 14 figures, 15 tables)

This paper contains 28 sections, 2 equations, 14 figures, 15 tables.

Figures (14)

  • Figure 1: The visualization of the proposed SynMVCrowd dataset: the type of scene, time stamps, and the weather condition.
  • Figure 2: The whole process of the multi-view crowd image generation, including scene setup, character setup, and the scene synthesis.
  • Figure 3: The two pie charts respectively illustrate the distribution of time stamps and weather conditions in SynMVCrowd. On the right side, each label denotes a time of 24 hours.
  • Figure 4: The action catalog of SynMVCrowd, consisting of various human poses in daily life.
  • Figure 5: The frame numbers of crowd count distributions of SynMVCrowd.
  • ...and 9 more figures