Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting

Wei Lin; Chenyang Zhao; Antoni B. Chan

Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting

Wei Lin, Chenyang Zhao, Antoni B. Chan

TL;DR

This work addresses the labor-intensive annotation required for point-based crowd counting by introducing a semi-supervised learning framework that leverages pseudo-labels. Through PSAM, a gradient-based visualization, the authors reveal that background regions fail to receive useful supervision under point-to-point (P2P) matching, which motivates a shift to Point-to-Region (P2R) matching that propagates pseudo-label confidence to local regions. The proposed P2R loss eliminates the need for the computationally heavy Hungarian matching, while enabling effective training with limited labeled data and abundant unlabeled data, achieving strong results in semi-supervised counting and unsupervised domain adaptation. Empirical results on multiple datasets show P2R outperforms or matches state-of-the-art methods, with substantial gains in efficiency and robustness, and the authors provide code for reproducibility.

Abstract

Point detection has been developed to locate pedestrians in crowded scenes by training a counter through a point-to-point (P2P) supervision scheme. Despite its excellent localization and counting performance, training a point-based counter still faces challenges concerning annotation labor: hundreds to thousands of points are required to annotate a single sample capturing a dense crowd. In this paper, we integrate point-based methods into a semi-supervised counting framework based on pseudo-labeling, enabling the training of a counter with only a few annotated samples supplemented by a large volume of pseudo-labeled data. However, during implementation, the training encounters issues as the confidence for pseudo-labels fails to be propagated to background pixels via the P2P. To tackle this challenge, we devise a point-specific activation map (PSAM) to visually interpret the phenomena occurring during the ill-posed training. Observations from the PSAM suggest that the feature map is excessively activated by the loss for unlabeled data, causing the decoder to misinterpret these over-activations as pedestrians. To mitigate this issue, we propose a point-to-region (P2R) scheme to substitute P2P, which segments out local regions rather than detects a point corresponding to a pedestrian for supervision. Consequently, pixels in the local region can share the same confidence with the corresponding pseudo points. Experimental results in both semi-supervised counting and unsupervised domain adaptation highlight the advantages of our method, illustrating P2R can resolve issues identified in PSAM. The code is available at https://github.com/Elin24/P2RLoss.

Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting

TL;DR

Abstract

Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)