Table of Contents
Fetching ...

GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship Graphs

Mingyu Liu, Ekim Yurtsever, Marc Brede, Jun Meng, Walter Zimmer, Xingcheng Zhou, Bare Luka Zagar, Yuning Cui, Alois Knoll

TL;DR

The paper tackles the limitation of two-stage 3D detectors that refine proposals independently by introducing an inter-object relation module. This module uses a Graph Generator to form a frame-wide, directed graph over proposals and a Graph Neural Network to iteratively refine proposal features using edge and box-difference information, producing richer representations for the final detection head. Empirical results on KITTI show consistent improvements over PV-RCNN and PartA$^{2}$ baselines in 3D and BEV AP, especially for the car class, and ablation studies validate the benefits of box-informed initialization, edge features, and graph configuration. The approach demonstrates the value of context-aware relational reasoning in 3D perception, with potential extensions to include point-level cues and evaluation on broader datasets.

Abstract

Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object relation module, consisting of a graph generator and a graph neural network (GNN), to learn the spatial information from certain patterns to improve 3D object detection. Specifically, we create an inter-object relationship graph based on proposals in a frame via the graph generator to connect each proposal with its neighbor proposals. Afterward, the GNN module extracts edge features from the generated graph and iteratively refines proposal features with the captured edge features. Ultimately, we leverage the refined features as input to the detection head to obtain detection results. Our approach improves upon the baseline PV-RCNN on the KITTI validation set for the car class across easy, moderate, and hard difficulty levels by 0.82%, 0.74%, and 0.58%, respectively. Additionally, our method outperforms the baseline by more than 1% under the moderate and hard levels BEV AP on the test server.

GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship Graphs

TL;DR

The paper tackles the limitation of two-stage 3D detectors that refine proposals independently by introducing an inter-object relation module. This module uses a Graph Generator to form a frame-wide, directed graph over proposals and a Graph Neural Network to iteratively refine proposal features using edge and box-difference information, producing richer representations for the final detection head. Empirical results on KITTI show consistent improvements over PV-RCNN and PartA baselines in 3D and BEV AP, especially for the car class, and ablation studies validate the benefits of box-informed initialization, edge features, and graph configuration. The approach demonstrates the value of context-aware relational reasoning in 3D perception, with potential extensions to include point-level cues and evaluation on broader datasets.

Abstract

Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object relation module, consisting of a graph generator and a graph neural network (GNN), to learn the spatial information from certain patterns to improve 3D object detection. Specifically, we create an inter-object relationship graph based on proposals in a frame via the graph generator to connect each proposal with its neighbor proposals. Afterward, the GNN module extracts edge features from the generated graph and iteratively refines proposal features with the captured edge features. Ultimately, we leverage the refined features as input to the detection head to obtain detection results. Our approach improves upon the baseline PV-RCNN on the KITTI validation set for the car class across easy, moderate, and hard difficulty levels by 0.82%, 0.74%, and 0.58%, respectively. Additionally, our method outperforms the baseline by more than 1% under the moderate and hard levels BEV AP on the test server.
Paper Structure (15 sections, 5 equations, 5 figures, 8 tables)

This paper contains 15 sections, 5 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Two-stage 3D object detection network extended with the proposed inter-object relation module (shown in yellow). We implement our module after a detector's RPN and RoI pooling modules. The object relation module consists of a Graph Generator and a GNN. First, the Graph Generator conducts inter-object relation graphs based on the centers of proposal boxes. After that, the GNN takes the proposal features and box information to calculate and iteratively update the node features. Ultimately, the node features of the same node obtained by different layers are concatenated, which are the input of the following detection head.
  • Figure 2: Comparison between PV-RCNN shi2020pv (left) and our PV-RCNN with an object relation module (right). Blue and green bounding boxes represent the ground truth and predictions, respectively. The dark red lines illustrate the relationship graph. As shown in the red box, our framework improves the bounding box rotation accuracy compared with the baseline PV-RCNN because of the information provided by similar patterns of the parked cars.
  • Figure 3: Example of generated graphs on proposals from the first stage of PV-RCNN shi2020pv. The left graph was generated based on KNN (K=16), leading every proposal to connect to its 16 nearest neighbors. The right diagram was generated via a radius graph with a threshold of six meters. Ground truth is shown in blue. Dark red lines illustrate the graph edges. Green and light blue represent the predicted cars and pedestrians.
  • Figure 4: Illustration of Graph Neural Network architecture. We utilize the proposal features and boxes as input to initialize the node features of the inter-object relationship graph. After a set of GNN layers, the object relation module outputs the refined feature of each proposal.
  • Figure 5: Qualitative results on KITTI validation set. We demonstrate detection results from three scenarios. Ground truths, the results of baseline, and ours are shown in blue, red, and green, respectively. By learning similar movement patterns and extracting relation information, our approach significantly reduces false positives and improves predicted rotation accuracy. Zoom in for more details.