Table of Contents
Fetching ...

Single Domain Generalization for Crowd Counting

Zhuoxuan Peng, S. -H. Gary Chan

TL;DR

This work tackles the challenge of domain shift in crowd counting by introducing MPCount, a single-domain generalization method for density map regression. MPCount combines an Attention Memory Bank (AMB) with a Content Error Mask (CEM) and Attention Consistency Loss (ACL) to learn domain-invariant density representations from a single source domain, and adds Patch-wise Classification (PC) with patch-level supervision via patch-wise classification maps (PCMs) to mitigate label ambiguity. The approach avoids sub-domain division and demonstrates strong generalization to unseen target domains and narrow source distributions across multiple datasets, outperforming existing SDG and some DA methods. The method is validated with extensive ablations and qualitative analyses, and the authors release code for reproducibility.

Abstract

Due to its promising results, density map regression has been widely employed for image-based crowd counting. The approach, however, often suffers from severe performance degradation when tested on data from unseen scenarios, the so-called "domain shift" problem. To address the problem, we investigate in this work single domain generalization (SDG) for crowd counting. The existing SDG approaches are mainly for image classification and segmentation, and can hardly be extended to our case due to its regression nature and label ambiguity (i.e., ambiguous pixel-level ground truths). We propose MPCount, a novel effective SDG approach even for narrow source distribution. MPCount stores diverse density values for density map regression and reconstructs domain-invariant features by means of only one memory bank, a content error mask and attention consistency loss. By partitioning the image into grids, it employs patch-wise classification as an auxiliary task to mitigate label ambiguity. Through extensive experiments on different datasets, MPCount is shown to significantly improve counting accuracy compared to the state of the art under diverse scenarios unobserved in the training data characterized by narrow source distribution. Code is available at https://github.com/Shimmer93/MPCount.

Single Domain Generalization for Crowd Counting

TL;DR

This work tackles the challenge of domain shift in crowd counting by introducing MPCount, a single-domain generalization method for density map regression. MPCount combines an Attention Memory Bank (AMB) with a Content Error Mask (CEM) and Attention Consistency Loss (ACL) to learn domain-invariant density representations from a single source domain, and adds Patch-wise Classification (PC) with patch-level supervision via patch-wise classification maps (PCMs) to mitigate label ambiguity. The approach avoids sub-domain division and demonstrates strong generalization to unseen target domains and narrow source distributions across multiple datasets, outperforming existing SDG and some DA methods. The method is validated with extensive ablations and qualitative analyses, and the authors release code for reproducibility.

Abstract

Due to its promising results, density map regression has been widely employed for image-based crowd counting. The approach, however, often suffers from severe performance degradation when tested on data from unseen scenarios, the so-called "domain shift" problem. To address the problem, we investigate in this work single domain generalization (SDG) for crowd counting. The existing SDG approaches are mainly for image classification and segmentation, and can hardly be extended to our case due to its regression nature and label ambiguity (i.e., ambiguous pixel-level ground truths). We propose MPCount, a novel effective SDG approach even for narrow source distribution. MPCount stores diverse density values for density map regression and reconstructs domain-invariant features by means of only one memory bank, a content error mask and attention consistency loss. By partitioning the image into grids, it employs patch-wise classification as an auxiliary task to mitigate label ambiguity. Through extensive experiments on different datasets, MPCount is shown to significantly improve counting accuracy compared to the state of the art under diverse scenarios unobserved in the training data characterized by narrow source distribution. Code is available at https://github.com/Shimmer93/MPCount.
Paper Structure (28 sections, 14 equations, 16 figures, 6 tables)

This paper contains 28 sections, 14 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Single domain generalization for crowd counting. (a) Sample images from a single source domain $\mathcal{S}$ with narrow distribution. (b) A challenging target image from an unseen scenario. (c) The ground-truth density map. (d) The predicted density map from the previous work DCCUS du2023domaingeneral trained on $\mathcal{S}$. (e) The predicted density map from our MPCount trained on $\mathcal{S}$. MPCount achieves much lower counting error than DCCUS and makes better predictions in the cropped area against the domain shift.
  • Figure 2: The overall training pipeline of our proposed MPCount. All identical modules in this diagram share the same weights. In the encoder-decoder structure, a higher level indicates a deeper feature.
  • Figure 3: Illustration of label ambiguity and how PCM tackles it. Point A is a typical example which belongs to a human head but is assigned the density value 0 as the background in the density map due to varying head sizes. In the PCM, the patch covering A is correctly classified as containing heads, thanks to the patch-level accuracy of the labels.
  • Figure 4: Sample images with different instance-level labels in JHU-Crowd++.
  • Figure 5: Visualization results of DCCUS du2023domaingeneral and MPCount under different DG settings. First row: A $\rightarrow$ B; second row: SN $\rightarrow$ FH; third row: B $\rightarrow$ A.
  • ...and 11 more figures