Table of Contents
Fetching ...

DPGP: A Hybrid 2D-3D Dual Path Potential Ghost Probe Zone Prediction Framework for Safe Autonomous Driving

Weiming Qu, Jiawei Du, Shenghai Yuan, Jia Wang, Yang Sun, Shengyi Liu, Yuanhao Zhu, Jianfeng Yu, Song Cao, Rui Xia, Xiaoyu Tang, Xihong Wu, Dingsheng Luo

TL;DR

This work tackles ghost probe zone prediction in urban driving by introducing DPGP, a monocular-camera framework that fuses 2D image features and 3D point-cloud representations derived from depth maps. The method uses Metric3Dv2 for depth estimation, a U-Net for 2D features, PointNet++ for 3D features, and a cross-attention mechanism to integrate modalities, enabling prediction of occlusion-related zones beyond vehicle-induced blind spots. A new CAIC-G dataset is provided, with extensive experiments on KITTI and cross-domain evaluation showing clear improvements in F1-scores over strong baselines. The approach is cost-effective, hardware-light, and open-sourcing is planned to foster community adoption and further research.

Abstract

Modern robots must coexist with humans in dense urban environments. A key challenge is the ghost probe problem, where pedestrians or objects unexpectedly rush into traffic paths. This issue affects both autonomous vehicles and human drivers. Existing works propose vehicle-to-everything (V2X) strategies and non-line-of-sight (NLOS) imaging for ghost probe zone detection. However, most require high computational power or specialized hardware, limiting real-world feasibility. Additionally, many methods do not explicitly address this issue. To tackle this, we propose DPGP, a hybrid 2D-3D fusion framework for ghost probe zone prediction using only a monocular camera during training and inference. With unsupervised depth prediction, we observe ghost probe zones align with depth discontinuities, but different depth representations offer varying robustness. To exploit this, we fuse multiple feature embeddings to improve prediction. To validate our approach, we created a 12K-image dataset annotated with ghost probe zones, carefully sourced and cross-checked for accuracy. Experimental results show our framework outperforms existing methods while remaining cost-effective. To our knowledge, this is the first work extending ghost probe zone prediction beyond vehicles, addressing diverse non-vehicle objects. We will open-source our code and dataset for community benefit.

DPGP: A Hybrid 2D-3D Dual Path Potential Ghost Probe Zone Prediction Framework for Safe Autonomous Driving

TL;DR

This work tackles ghost probe zone prediction in urban driving by introducing DPGP, a monocular-camera framework that fuses 2D image features and 3D point-cloud representations derived from depth maps. The method uses Metric3Dv2 for depth estimation, a U-Net for 2D features, PointNet++ for 3D features, and a cross-attention mechanism to integrate modalities, enabling prediction of occlusion-related zones beyond vehicle-induced blind spots. A new CAIC-G dataset is provided, with extensive experiments on KITTI and cross-domain evaluation showing clear improvements in F1-scores over strong baselines. The approach is cost-effective, hardware-light, and open-sourcing is planned to foster community adoption and further research.

Abstract

Modern robots must coexist with humans in dense urban environments. A key challenge is the ghost probe problem, where pedestrians or objects unexpectedly rush into traffic paths. This issue affects both autonomous vehicles and human drivers. Existing works propose vehicle-to-everything (V2X) strategies and non-line-of-sight (NLOS) imaging for ghost probe zone detection. However, most require high computational power or specialized hardware, limiting real-world feasibility. Additionally, many methods do not explicitly address this issue. To tackle this, we propose DPGP, a hybrid 2D-3D fusion framework for ghost probe zone prediction using only a monocular camera during training and inference. With unsupervised depth prediction, we observe ghost probe zones align with depth discontinuities, but different depth representations offer varying robustness. To exploit this, we fuse multiple feature embeddings to improve prediction. To validate our approach, we created a 12K-image dataset annotated with ghost probe zones, carefully sourced and cross-checked for accuracy. Experimental results show our framework outperforms existing methods while remaining cost-effective. To our knowledge, this is the first work extending ghost probe zone prediction beyond vehicles, addressing diverse non-vehicle objects. We will open-source our code and dataset for community benefit.

Paper Structure

This paper contains 16 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Typical ghost probe accident scenario and proposed solution. The top section shows a pedestrian jaywalking, occluded by a bus, making them invisible to an approaching vehicle due to the driver's limited field of view. The bottom section illustrates the proposed solution, using early detection, unsupervised learning, and depth fusion to anticipate occluded pedestrians, enabling proactive actions like lane changing or slowing down. Key challenges include generalization, noise handling, and efficiency.
  • Figure 2: Limitations of using only 2D images guo2018blind or 3D point clouds qi2017pointnet++: In the left image, the edge within the red box does not represent a potential ghost probe zone. In the right image, point cloud sparsity issue occurs within the red box.
  • Figure 3: Overview of the proposed 2D-3D feature fusion framework. The pipeline extracts depth, depth gradient, and 3D point clouds from a monocular image, processes them through a hierarchical encoder-decoder network with skip connections and cross-attention, and generates a fused representation for ghost probe detection.
  • Figure 4: Qualitative results of DPGP. We selected 1 sample from each scene in the KITTI dataset and 3 samples from the CAIC-G dataset for visualization