Table of Contents
Fetching ...

Gating Syn-to-Real Knowledge for Pedestrian Crossing Prediction in Safe Driving

Jie Bai, Jianwu Fang, Yisheng Lv, Chen Lv, Jianru Xue, Zhengguo Li

TL;DR

This work proposes a gated syn-to-real knowledge transfer approach for PCP (Gated-S2R-PCP), which has two aims: designing the suitable domain adaptation ways for different kinds of crossing-domain knowledge, and transferring suitable knowledge for specific situations with gated knowledge fusion.

Abstract

Pedestrian Crossing Prediction (PCP) in driving scenes plays a critical role in ensuring the safe operation of intelligent vehicles. Due to the limited observations of pedestrian crossing behaviors in typical situations, recent studies have begun to leverage synthetic data with flexible variation to boost prediction performance, employing domain adaptation frameworks. However, different domain knowledge has distinct cross-domain distribution gaps, which necessitates suitable domain knowledge adaption ways for PCP tasks. In this work, we propose a Gated Syn-to-Real Knowledge transfer approach for PCP (Gated-S2R-PCP), which has two aims: 1) designing the suitable domain adaptation ways for different kinds of crossing-domain knowledge, and 2) transferring suitable knowledge for specific situations with gated knowledge fusion. Specifically, we design a framework that contains three domain adaption methods including style transfer, distribution approximation, and knowledge distillation for various information, such as visual, semantic, depth, location, etc. A Learnable Gated Unit (LGU) is employed to fuse suitable cross-domain knowledge to boost pedestrian crossing prediction. We construct a new synthetic benchmark S2R-PCP-3181 with 3181 sequences (489,740 frames) which contains the pedestrian locations, RGB frames, semantic images, and depth images. With the synthetic S2R-PCP-3181, we transfer the knowledge to two real challenging datasets of PIE and JAAD, and superior PCP performance is obtained to the state-of-the-art methods.

Gating Syn-to-Real Knowledge for Pedestrian Crossing Prediction in Safe Driving

TL;DR

This work proposes a gated syn-to-real knowledge transfer approach for PCP (Gated-S2R-PCP), which has two aims: designing the suitable domain adaptation ways for different kinds of crossing-domain knowledge, and transferring suitable knowledge for specific situations with gated knowledge fusion.

Abstract

Pedestrian Crossing Prediction (PCP) in driving scenes plays a critical role in ensuring the safe operation of intelligent vehicles. Due to the limited observations of pedestrian crossing behaviors in typical situations, recent studies have begun to leverage synthetic data with flexible variation to boost prediction performance, employing domain adaptation frameworks. However, different domain knowledge has distinct cross-domain distribution gaps, which necessitates suitable domain knowledge adaption ways for PCP tasks. In this work, we propose a Gated Syn-to-Real Knowledge transfer approach for PCP (Gated-S2R-PCP), which has two aims: 1) designing the suitable domain adaptation ways for different kinds of crossing-domain knowledge, and 2) transferring suitable knowledge for specific situations with gated knowledge fusion. Specifically, we design a framework that contains three domain adaption methods including style transfer, distribution approximation, and knowledge distillation for various information, such as visual, semantic, depth, location, etc. A Learnable Gated Unit (LGU) is employed to fuse suitable cross-domain knowledge to boost pedestrian crossing prediction. We construct a new synthetic benchmark S2R-PCP-3181 with 3181 sequences (489,740 frames) which contains the pedestrian locations, RGB frames, semantic images, and depth images. With the synthetic S2R-PCP-3181, we transfer the knowledge to two real challenging datasets of PIE and JAAD, and superior PCP performance is obtained to the state-of-the-art methods.
Paper Structure (20 sections, 13 equations, 14 figures, 6 tables)

This paper contains 20 sections, 13 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Illustration for the suitable syn-to-real knowledge transfer for different kinds of information in driving scenes. (a) plots the feature distributions of real and synthetic datasets, for the pedestrian locations, RGB frames, and semantic/depth images. (b) illustrates our work for differentiated syn-real knowledge transfer with different domain gaps. The feature distribution in (a) is obtained by t-SNE van2008visualizing on the feature vectors of 1000 randomly selected samples (each sample has 16 frames) in the synthetic PCP dataset (Syn-PCP-3181, to be described in Sec. \ref{['sec-data']}) and a real PCP dataset (PIE DBLP:conf/iccv/RasouliKKT19). Notably, because of the different input shapes of pedestrian locations and images, the feature vectors of pedestrian locations and image-like inputs are extracted by Transformer DBLP:conf/nips/VaswaniSPUJGKP17 and Timesformer DBLP:conf/icml/BertasiusWT21, respectively.
  • Figure 2: The framework of Gated-S2R-PCP. Synthetic domain: pedestrian bounding boxes $\emph{B}_{syn}$, RGB frames $\emph{I}_{syn}$, depth images $\emph{D}_{syn}$ and semantic images $\emph{S}_{syn}$. Real domain: pedestrian bounding boxes $\emph{B}_{real}$ and RGB frames $\emph{I}_{real}$. During the training phase, the Knowledge Distiller, Style Shifer, and Distribution Approximator are trained to fulfill the knowledge transfer of $\emph{B}_{syn}\rightarrow \emph{B}_{real}$ (Eq. \ref{['eq:KD']}), $\emph{I}_{syn}\rightarrow \emph{I}_{real}$ (Eq. \ref{['eq:SS']}), and $\{\emph{D}_{syn},\emph{S}_{syn}\}\longleftrightarrow \emph{I}_{real}$ (Eq. \ref{['eq:DA']}), respectively. With these domain adaptation approaches, we obtain the feature embedding of $f_{\mathcal{S}}(\emph{B}_{real})$ of $\emph{B}_{real}$, $f_{\psi}(\emph{I}_{st})$ for style-transferred image set $\emph{I}_{st}$ and the $f_{\psi}(\emph{I}_{real})$ for $\emph{I}_{real}$ over $T$ frames. Finally, a Learnable Gated Unit (LGU) adaptively fuses $f_{\mathcal{S}}(\emph{B}_{real})$, $f_{\psi}(\emph{I}_{st})$ and $f_{\psi}(\emph{I}_{real})$ to form the multi-source feature $f_{gate}$ for pedestrian crossing prediction. During the testing phase, only $\emph{I}_{real}$ and $\emph{B}_{real}$ are the input up to time $T$ (1:$T$). The pedestrian crossing predictor determines the crossing or not crossing label of the pedestrians at time $T+\tau$, where $\tau$ means the Time-to-Crossing (TTC).
  • Figure 3: The structure of Knowledge Distiller.
  • Figure 4: The structure of Style Shifter. The real RGB frames $\emph{I}_{real}$ are the content images and the synthetic RGB frames $\emph{I}_{syn}$ are the style images.
  • Figure 5: The structure of Distribution Approximator. The generator encodes the input to the high-dimensional feature embedding, and the discriminator acts as the domain classifier, which aims to confuse the domain label and approximate the feature embedding of different domains to a shared space.
  • ...and 9 more figures