Table of Contents
Fetching ...

Semi-Supervised Multi-View Crowd Counting by Ranking Multi-View Fusion Models

Qi Zhang, Yunfei Gong, Zhidan Xie, Zhizi Wang, Antoni B. Chan, Hui Huang

TL;DR

This paper tackles data scarcity in multi-view crowd counting by introducing two semi-supervised frameworks that exploit unlabeled data through ranking constraints across variable numbers of input views. MVPR Semi-MVCC enforces prediction-based ranking across view subsets, while MVUR Semi-MVCC adds a model-uncertainty ranking mechanism to mitigate over-counting in challenging scenes. Together, they demonstrate superior performance over single-image semi-supervised methods across multiple datasets, with MVUR offering the best robustness due to uncertainty modeling. The approach also shows strong generalization and maintains practical training costs, highlighting its potential for real-world multi-view counting with limited labels.

Abstract

Multi-view crowd counting has been proposed to deal with the severe occlusion issue of crowd counting in large and wide scenes. However, due to the difficulty of collecting and annotating multi-view images, the datasets for multi-view counting have a limited number of multi-view frames and scenes. To solve the problem of limited data, one approach is to collect synthetic data to bypass the annotating step, while another is to propose semi- or weakly-supervised or unsupervised methods that demand less multi-view data. In this paper, we propose two semi-supervised multi-view crowd counting frameworks by ranking the multi-view fusion models of different numbers of input views, in terms of the model predictions or the model uncertainties. Specifically, for the first method (vanilla model), we rank the multi-view fusion models' prediction results of different numbers of camera-view inputs, namely, the model's predictions with fewer camera views shall not be larger than the predictions with more camera views. For the second method, we rank the estimated model uncertainties of the multi-view fusion models with a variable number of view inputs, guided by the multi-view fusion models' prediction errors, namely, the model uncertainties with more camera views shall not be larger than those with fewer camera views. These constraints are introduced into the model training in a semi-supervised fashion for multi-view counting with limited labeled data. The experiments demonstrate the advantages of the proposed multi-view model ranking methods compared with other semi-supervised counting methods.

Semi-Supervised Multi-View Crowd Counting by Ranking Multi-View Fusion Models

TL;DR

This paper tackles data scarcity in multi-view crowd counting by introducing two semi-supervised frameworks that exploit unlabeled data through ranking constraints across variable numbers of input views. MVPR Semi-MVCC enforces prediction-based ranking across view subsets, while MVUR Semi-MVCC adds a model-uncertainty ranking mechanism to mitigate over-counting in challenging scenes. Together, they demonstrate superior performance over single-image semi-supervised methods across multiple datasets, with MVUR offering the best robustness due to uncertainty modeling. The approach also shows strong generalization and maintains practical training costs, highlighting its potential for real-world multi-view counting with limited labels.

Abstract

Multi-view crowd counting has been proposed to deal with the severe occlusion issue of crowd counting in large and wide scenes. However, due to the difficulty of collecting and annotating multi-view images, the datasets for multi-view counting have a limited number of multi-view frames and scenes. To solve the problem of limited data, one approach is to collect synthetic data to bypass the annotating step, while another is to propose semi- or weakly-supervised or unsupervised methods that demand less multi-view data. In this paper, we propose two semi-supervised multi-view crowd counting frameworks by ranking the multi-view fusion models of different numbers of input views, in terms of the model predictions or the model uncertainties. Specifically, for the first method (vanilla model), we rank the multi-view fusion models' prediction results of different numbers of camera-view inputs, namely, the model's predictions with fewer camera views shall not be larger than the predictions with more camera views. For the second method, we rank the estimated model uncertainties of the multi-view fusion models with a variable number of view inputs, guided by the multi-view fusion models' prediction errors, namely, the model uncertainties with more camera views shall not be larger than those with fewer camera views. These constraints are introduced into the model training in a semi-supervised fashion for multi-view counting with limited labeled data. The experiments demonstrate the advantages of the proposed multi-view model ranking methods compared with other semi-supervised counting methods.

Paper Structure

This paper contains 17 sections, 10 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: The general pipeline of (a) our multi-view prediction ranking semi-supervised multi-view counting method (denoted as MVPR Semi-MVCC); and (b) our multi-view uncertainty ranking semi-supervised multi-view crowd counting method (denoted as MVUR Semi-MVCC): For the former, the multi-view fusion prediction results of fewer camera views shall not be larger ($\leq$) than the counting results using more camera views. Additionally, for the latter, the estimated model uncertainties with more camera views shall not be larger ($\leq$) than those with fewer camera views.
  • Figure 2: The pipeline of the multi-view uncertainty ranking semi-supervised crowd counting method (MVUR Semi-MVCC). The proposed MVUR Semi-MVCC model uses the model uncertainty ranking instead of the model prediction ranking of different numbers of camera-view inputs for unlabeled data. The model comprises feature extraction of the image encoder, multi-view feature projection and fusion, model uncertainty decoder, and multi-view decoder. Three losses are used in training: the multi-view fusion density map prediction loss $L_{label}$ and model uncertainty estimation loss $L^{un}_{label}$ for labeled data, and the multi-view fusion model uncertainty ranking loss $L^{un}_{rank}$ for unlabeled data. Dashed arrows refer to steps only used in the training.
  • Figure 3: The density map predictions of labeled data using different numbers of camera views on the 3 datasets, which demonstrates the multi-view prediction ranking order between the fusion predictions of variable camera views.
  • Figure 4: The predicted uncertainty maps when using different numbers of input camera views on the 3 datasets, which demonstrates their rank ordering.
  • Figure 5: The visualization results of the proposed MVPR and MVUR Semi-MVCC methods and the comparison methods on CVCS under different annotating rates. Overall, MVPR and MVUR Semi-MVCC achieve the best results among all methods.
  • ...and 2 more figures