Table of Contents
Fetching ...

Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation

Junyu Gao, Da Zhang, Qiyu Wang, Zhiyuan Zhao, Xuelong Li

TL;DR

This paper tackles the challenge of crowd localization under domain shifts by treating the pixel-wise threshold learning as a binary classifier task and introducing Dynamic Proxy Domain (DPD). Grounded in a formal generalization bound, the authors generate a proxy domain to tighten the bound and design a training regime that jointly optimizes source data and proxy-domain samples, aided by a momentum-based mechanism and a dynamic threshold generator. The approach yields theoretical guarantees and practical improvements across six crowd datasets, showing robust generalization to unseen target domains. DP-D advances cross-domain crowd localization by providing a principled method to mitigate distributional divergence without target-domain labels, with potential for broader domain-adaptation applications.

Abstract

Crowd localization targets on predicting each instance precise location within an image. Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction, in which the pixel-level thresholds binarize the prediction confidence of being the pedestrian head. Since the crowd scenes suffer from extremely varying contents, counts and scales, the confidence-threshold learner is fragile and under-generalized encountering domain knowledge shift. Moreover, at the most time, the target domain is agnostic in training. Hence, it is imperative to exploit how to enhance the generalization of confidence-threshold locator to the latent target domain. In this paper, we propose a Dynamic Proxy Domain (DPD) method to generalize the learner under domain shift. Concretely, based on the theoretical analysis to the generalization error risk upper bound on the latent target domain to a binary classifier, we propose to introduce a generated proxy domain to facilitate generalization. Then, based on the theory, we design a DPD algorithm which is composed by a training paradigm and proxy domain generator to enhance the domain generalization of the confidence-threshold learner. Besides, we conduct our method on five kinds of domain shift scenarios, demonstrating the effectiveness on generalizing the crowd localization. Our code will be available at https://github.com/zhangda1018/DPD.

Dynamic Proxy Domain Generalizes the Crowd Localization by Better Binary Segmentation

TL;DR

This paper tackles the challenge of crowd localization under domain shifts by treating the pixel-wise threshold learning as a binary classifier task and introducing Dynamic Proxy Domain (DPD). Grounded in a formal generalization bound, the authors generate a proxy domain to tighten the bound and design a training regime that jointly optimizes source data and proxy-domain samples, aided by a momentum-based mechanism and a dynamic threshold generator. The approach yields theoretical guarantees and practical improvements across six crowd datasets, showing robust generalization to unseen target domains. DP-D advances cross-domain crowd localization by providing a principled method to mitigate distributional divergence without target-domain labels, with potential for broader domain-adaptation applications.

Abstract

Crowd localization targets on predicting each instance precise location within an image. Current advanced methods propose the pixel-wise binary classification to tackle the congested prediction, in which the pixel-level thresholds binarize the prediction confidence of being the pedestrian head. Since the crowd scenes suffer from extremely varying contents, counts and scales, the confidence-threshold learner is fragile and under-generalized encountering domain knowledge shift. Moreover, at the most time, the target domain is agnostic in training. Hence, it is imperative to exploit how to enhance the generalization of confidence-threshold locator to the latent target domain. In this paper, we propose a Dynamic Proxy Domain (DPD) method to generalize the learner under domain shift. Concretely, based on the theoretical analysis to the generalization error risk upper bound on the latent target domain to a binary classifier, we propose to introduce a generated proxy domain to facilitate generalization. Then, based on the theory, we design a DPD algorithm which is composed by a training paradigm and proxy domain generator to enhance the domain generalization of the confidence-threshold learner. Besides, we conduct our method on five kinds of domain shift scenarios, demonstrating the effectiveness on generalizing the crowd localization. Our code will be available at https://github.com/zhangda1018/DPD.
Paper Structure (25 sections, 4 theorems, 45 equations, 13 figures, 5 tables, 2 algorithms)

This paper contains 25 sections, 4 theorems, 45 equations, 13 figures, 5 tables, 2 algorithms.

Key Result

Lemma 1

Assume that the $\mathcal{H}$ is a hypothesis space with a VC dimension of $d$ and $m$ is the number of training samples, drawn from $\mathcal{D}_s$. Given an $h \in \mathcal{H}$, which is a binary classifier, the following inequality holds (specific proof is in Appendix proof_lemma1.) with a probab in which

Figures (13)

  • Figure 1: The superior performance achieved by existing segmentation based crowd locators mostly depends on the robust threshold to classify the samples into two parts. However, when transferring the threshold to another domain, the specific knowledge incurs some samples are ineffective under the thresholds.
  • Figure 2: An example for the pipeline of adaptive instance crowd localization. To facilitate visualization, only the image patch in yellow window are fed into crowd locator.
  • Figure 3: Overview of our core idea to the proposed Dynamic Proxy Domain. Comparing with implementing ERM on the source domain, we introduce DPD which minimizes the divergence between source distribution with target distribution. To this end, the decision boundary among domains can be weaken on the hypothesis space.
  • Figure 4: (a) Convergence to the fixed source domain. (b) Convergence to the source domain along with dynamic proxy domain simultaneously.
  • Figure 5: Scale distribution comparison between SHHA with other adopted datasets.
  • ...and 8 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Definition 3
  • Theorem 1
  • Corollary 1
  • Theorem 2