Table of Contents
Fetching ...

SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving

Honghao Fu, Yongli Gu, Yidong Yan, Yilang Shen, Yiwen Wu, Libo Sun

TL;DR

The paper tackles real-time completion of occluded pedestrian poses in autonomous driving by introducing SDR-GAIN, a lightweight self-supervised GAN framework that imputes missing keypoints from coordinate distributions via pose separation and dimensionality reduction. It leverages dual generators for head and torso, masked-learning with hints, and pose standardization to learn robust spatial relationships, achieving microsecond inference and outperforming both traditional imputation methods and Transformer-based approaches on COCO and JAAD. Ablation studies validate the effectiveness of separation, dimensionality reduction, and the two-generator design, while highlighting remaining challenges such as dataset size and adversarial training stability. Overall, SDR-GAIN offers a practical, real-time solution for occlusion-resilient pedestrian pose estimation in autonomous driving systems.

Abstract

With the advancement of vision-based autonomous driving technology, pedestrian detection have become an important component for improving traffic safety and driving system robustness. Nevertheless, in complex traffic scenarios, conventional pose estimation approaches frequently fail to accurately reconstruct occluded keypoints, primarily due to obstructions caused by vehicles, vegetation, or architectural elements. To address this issue, we propose a novel real-time occluded pedestrian pose completion framework termed Separation and Dimensionality Reduction-based Generative Adversarial Imputation Nets (SDR-GAIN). Unlike previous approaches that train visual models to distinguish occlusion patterns, SDR-GAIN aims to learn human pose directly from the numerical distribution of keypoint coordinates and interpolate missing positions. It employs a self-supervised adversarial learning paradigm to train lightweight generators with residual structures for the imputation of missing pose keypoints. Additionally, it integrates multiple pose standardization techniques to alleviate the difficulty of the learning process. Experiments conducted on the COCO and JAAD datasets demonstrate that SDR-GAIN surpasses conventional machine learning and Transformer-based missing data interpolation algorithms in accurately recovering occluded pedestrian keypoints, while simultaneously achieving microsecond-level real-time inference.

SDR-GAIN: A High Real-Time Occluded Pedestrian Pose Completion Method for Autonomous Driving

TL;DR

The paper tackles real-time completion of occluded pedestrian poses in autonomous driving by introducing SDR-GAIN, a lightweight self-supervised GAN framework that imputes missing keypoints from coordinate distributions via pose separation and dimensionality reduction. It leverages dual generators for head and torso, masked-learning with hints, and pose standardization to learn robust spatial relationships, achieving microsecond inference and outperforming both traditional imputation methods and Transformer-based approaches on COCO and JAAD. Ablation studies validate the effectiveness of separation, dimensionality reduction, and the two-generator design, while highlighting remaining challenges such as dataset size and adversarial training stability. Overall, SDR-GAIN offers a practical, real-time solution for occlusion-resilient pedestrian pose estimation in autonomous driving systems.

Abstract

With the advancement of vision-based autonomous driving technology, pedestrian detection have become an important component for improving traffic safety and driving system robustness. Nevertheless, in complex traffic scenarios, conventional pose estimation approaches frequently fail to accurately reconstruct occluded keypoints, primarily due to obstructions caused by vehicles, vegetation, or architectural elements. To address this issue, we propose a novel real-time occluded pedestrian pose completion framework termed Separation and Dimensionality Reduction-based Generative Adversarial Imputation Nets (SDR-GAIN). Unlike previous approaches that train visual models to distinguish occlusion patterns, SDR-GAIN aims to learn human pose directly from the numerical distribution of keypoint coordinates and interpolate missing positions. It employs a self-supervised adversarial learning paradigm to train lightweight generators with residual structures for the imputation of missing pose keypoints. Additionally, it integrates multiple pose standardization techniques to alleviate the difficulty of the learning process. Experiments conducted on the COCO and JAAD datasets demonstrate that SDR-GAIN surpasses conventional machine learning and Transformer-based missing data interpolation algorithms in accurately recovering occluded pedestrian keypoints, while simultaneously achieving microsecond-level real-time inference.
Paper Structure (15 sections, 18 equations, 6 figures, 8 tables)

This paper contains 15 sections, 18 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The framework of SDR-GAIN. It starts with initial pose estimation via OpenPose, followed by separation, rotation and dimensionality reduction methods for pose data standardization. Then, two generators are used to predict the missing keypoints for the head and torso. The generator's output is then reverse-processed to restore the complete pedestrian pose.
  • Figure 2: Separation and calculation of rotation angle. (a) Image after pose estimation; (b) Keypoints map, where blue and red dots represent keypoints of the torso and head, respectively; (c) Calculation of rotation angle with reference to the right and left ears, and the right and left shoulders.
  • Figure 3: Dimensionality reduction and normalization process: (a) Applied to the rotated torso keypoints, with the blue point representing the original 2D coordinates and the orange and dark green points representing the reduced coordinates along the $x$ and $y$ axes, respectively; (b) Applied to the rotated head keypoints, with the red point representing the original 2D coordinates and the yellow and light green points representing the reduced coordinates along the $x$ and $y$ axes, respectively.
  • Figure 4: The training process of generators and discriminators.
  • Figure 5: Network structure.
  • ...and 1 more figures