Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank

Sungjune Park; Hyunjun Kim; Yong Man Ro

Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank

Sungjune Park, Hyunjun Kim, Yong Man Ro

TL;DR

This work tackles the generalization gap in pedestrian detection by building a versatile pedestrian knowledge bank derived from a large-scale pretrained model (CLIP). The bank is formed by quantizing generalized pedestrian embeddings and guiding them with a learnable hint to become task-compatible, then integrated into both region-proposal and query-based detectors via cross-attention. Empirical results on four public datasets demonstrate state-of-the-art performance and strong cross-framework transfer, with analyses confirming semantic coherence of bank contents and robustness across driving and surveillance scenes. The approach offers a practical pathway to transplant broad visual knowledge into domain-specific detection, enabling robust performance without requiring end-to-end retraining of the pretrained model.

Abstract

Pedestrian detection is a crucial field of computer vision research which can be adopted in various real-world applications (e.g., self-driving systems). However, despite noticeable evolution of pedestrian detection, pedestrian representations learned within a detection framework are usually limited to particular scene data in which they were trained. Therefore, in this paper, we propose a novel approach to construct versatile pedestrian knowledge bank containing representative pedestrian knowledge which can be applicable to various detection frameworks and adopted in diverse scenes. We extract generalized pedestrian knowledge from a large-scale pretrained model, and we curate them by quantizing most representative features and guiding them to be distinguishable from background scenes. Finally, we construct versatile pedestrian knowledge bank which is composed of such representations, and then we leverage it to complement and enhance pedestrian features within a pedestrian detection framework. Through comprehensive experiments, we validate the effectiveness of our method, demonstrating its versatility and outperforming state-of-the-art detection performances.

Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank

TL;DR

Abstract

Paper Structure (28 sections, 7 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 7 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Related work
Object Detection
Pedestrian Detection
Generalized Knowledge in Large-scale Pretrained Models
Proposed Method
How to Construct Versatile Pedestrian Knowledge Bank
How to Leverage Versatile Pedestrian Knowledge
Region Proposal Based Two Stage Detection
Query Based Detection
Experiments
Experimental Settings
Pedestrian Detection Datasets
Implementation Details
Comparison with Existing Pedestrian Detection Methods
...and 13 more sections

Figures (6)

Figure 1: The overall concept of our approach. We extract generalized pedestrian knowledge from a large-scale pretrained model and curate them to be exemplary and task-compatible. The knowledge bank stores such knowledge, and it can be leveraged into various frameworks for robust pedestrian detection in diverse scene data.
Figure 2: The overall steps designed in the proposed approach. At the first step, we extract the knowledge embeddings of various instances from a large-scale pretrained image encoder. We quantize the most representative $\boldsymbol{f_q}$ and make them task-relevant by placing $\boldsymbol{f_h}$. Then we obtain task-compatible knowledge features $\boldsymbol{f_k}$. At the second step, we leverage $\boldsymbol{f_k}$ within a pedestrian detection framework.
Figure 3: The overview of leveraging task-compatible pedestrian knowledge $\boldsymbol{f_k}$. When pedestrian features $\boldsymbol{f_p}$ come in as query features, $\boldsymbol{f_k}$ functions as key and value features. So, $\boldsymbol{f_p}$ can refer to $\boldsymbol{f_k}$, distinguishable features from the bank, then the complemented pedestrian features $\boldsymbol{f_c}$ can be obtained.
Figure 4: The visualization analysis of semantics in the knowledge bank. We analyze which types of pedestrians are quantized together, and then we visualize the distribution of knowledge features using t-sne. Orange, green, and red $\times$ marks denote the 9th, 28th, and 43rd knowledge elements, respectively, while blue circles are for the others.
Figure 5: The visualization of detection results on diverse scenes. The yellow and red boxes mean ground-truth and prediction bounding boxes, respectively. The proposed method performs properly on general indoor/outdoor, surveillance, and driving environments. The images are zoomed in for the better visualization.
...and 1 more figures

Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank

TL;DR

Abstract

Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank

Authors

TL;DR

Abstract

Table of Contents

Figures (6)