Table of Contents
Fetching ...

WSCF-MVCC: Weakly-supervised Calibration-free Multi-view Crowd Counting

Bin Li, Daijie Chen, Qi Zhang

TL;DR

The paper tackles multi-view crowd counting without camera calibration or dense annotations by proposing WSCF-MVCC, a three-module framework that uses image-level counts to supervise a single-view counter, learns view correspondences via a homography-based matching estimator, and fuses per-view density maps through learned weights. A self-supervised ranking loss with multi-scale priors strengthens local region predictions, while semantic information guides view matching for better fusion. Experiments on CVCS, CityStreet, and PETS2009 show that WSCF-MVCC outperforms calibration-free baselines and rivals calibrated methods, highlighting practical viability. The work also provides extensive ablations and visualizations that justify the design choices and demonstrate robustness across camera configurations, and it releases code for replication.

Abstract

Multi-view crowd counting can effectively mitigate occlusion issues that commonly arise in single-image crowd counting. Existing deep-learning multi-view crowd counting methods project different camera view images onto a common space to obtain ground-plane density maps, requiring abundant and costly crowd annotations and camera calibrations. Hence, calibration-free methods are proposed that do not require camera calibrations and scene-level crowd annotations. However, existing calibration-free methods still require expensive image-level crowd annotations for training the single-view counting module. Thus, in this paper, we propose a weakly-supervised calibration-free multi-view crowd counting method (WSCF-MVCC), directly using crowd count as supervision for the single-view counting module rather than density maps constructed from crowd annotations. Instead, a self-supervised ranking loss that leverages multi-scale priors is utilized to enhance the model's perceptual ability without additional annotation costs. What's more, the proposed model leverages semantic information to achieve a more accurate view matching and, consequently, a more precise scene-level crowd count estimation. The proposed method outperforms the state-of-the-art methods on three widely used multi-view counting datasets under weakly supervised settings, indicating that it is more suitable for practical deployment compared with calibrated methods. Code is released in https://github.com/zqyq/Weakly-MVCC.

WSCF-MVCC: Weakly-supervised Calibration-free Multi-view Crowd Counting

TL;DR

The paper tackles multi-view crowd counting without camera calibration or dense annotations by proposing WSCF-MVCC, a three-module framework that uses image-level counts to supervise a single-view counter, learns view correspondences via a homography-based matching estimator, and fuses per-view density maps through learned weights. A self-supervised ranking loss with multi-scale priors strengthens local region predictions, while semantic information guides view matching for better fusion. Experiments on CVCS, CityStreet, and PETS2009 show that WSCF-MVCC outperforms calibration-free baselines and rivals calibrated methods, highlighting practical viability. The work also provides extensive ablations and visualizations that justify the design choices and demonstrate robustness across camera configurations, and it releases code for replication.

Abstract

Multi-view crowd counting can effectively mitigate occlusion issues that commonly arise in single-image crowd counting. Existing deep-learning multi-view crowd counting methods project different camera view images onto a common space to obtain ground-plane density maps, requiring abundant and costly crowd annotations and camera calibrations. Hence, calibration-free methods are proposed that do not require camera calibrations and scene-level crowd annotations. However, existing calibration-free methods still require expensive image-level crowd annotations for training the single-view counting module. Thus, in this paper, we propose a weakly-supervised calibration-free multi-view crowd counting method (WSCF-MVCC), directly using crowd count as supervision for the single-view counting module rather than density maps constructed from crowd annotations. Instead, a self-supervised ranking loss that leverages multi-scale priors is utilized to enhance the model's perceptual ability without additional annotation costs. What's more, the proposed model leverages semantic information to achieve a more accurate view matching and, consequently, a more precise scene-level crowd count estimation. The proposed method outperforms the state-of-the-art methods on three widely used multi-view counting datasets under weakly supervised settings, indicating that it is more suitable for practical deployment compared with calibrated methods. Code is released in https://github.com/zqyq/Weakly-MVCC.

Paper Structure

This paper contains 13 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Calibrated methods (top) require both camera calibration information to project image features and utilize annotations of image-level and scene-level people's locations. Calibration-free methods (middle) eliminate the need for camera calibration information and require the annotations of image-level people's locations. Based on calibration-free methods, weakly-supervised calibration-free methods (bottom, ours) use annotations of image-level people counts rather than locations, significantly reducing annotation costs.
  • Figure 2: The pipeline of the proposed WSCF-MVCC, including single-view crowd counting (SVCC), matching weight estimation (MWE), and multi-view crowd estimation (MVCE). Compared with CF-MVCC methods, we utilize the annotations of image-level people's counts to supervise the SVCC module rather than their locations.
  • Figure 3: Illustration of the ranking loss mechanism. The number of people contained in a certain area of the same image must be greater than or equal to the number of people contained in any sub-area within that area.
  • Figure 4: Visualization results of weight maps $W$ and density maps $D$. The red boxes indicate that our MSCF-MVCC method can obtain a more reliable match weight.
  • Figure 5: The counting results on CityStreet and PETS2009. It shows that the results of the proposed WSCF-MVCC are closer to the GT.