Table of Contents
Fetching ...

Edge-Centric Relational Reasoning for 3D Scene Graph Prediction

Yanni Ma, Hao Liu, Yulan Guo, Theo Gevers, Martin R. Oswald

TL;DR

The paper tackles 3D scene graph prediction by identifying the limitations of object-centric message passing and introducing LEO, a three-stage framework that transitions from edge-centric to object-centric reasoning. LEO uses a link-prediction module to softly weight edges, transforms the scene graph into a line graph to perform relation-level reasoning with a LineGNN, and then fuses the enriched relation features back into the original graph for final predictions. This edge-to-object approach is model-agnostic and yields consistent improvements on the 3DSSG benchmark when integrated with strong baselines like KISGP and 3DHetSGP, demonstrating the value of higher-order relational context. The results show gains in PredCls and SGCls tasks, indicating enhanced robustness and coherence of 3D scene graphs in indoor environments.

Abstract

3D scene graph prediction aims to abstract complex 3D environments into structured graphs consisting of objects and their pairwise relationships. Existing approaches typically adopt object-centric graph neural networks, where relation edge features are iteratively updated by aggregating messages from connected object nodes. However, this design inherently restricts relation representations to pairwise object context, making it difficult to capture high-order relational dependencies that are essential for accurate relation prediction. To address this limitation, we propose a Link-guided Edge-centric relational reasoning framework with Object-aware fusion, namely LEO, which enables progressive reasoning from relation-level context to object-level understanding. Specifically, LEO first predicts potential links between object pairs to suppress irrelevant edges, and then transforms the original scene graph into a line graph where each relation is treated as a node. A line graph neural network is applied to perform edge-centric relational reasoning to capture inter-relation context. The enriched relation features are subsequently integrated into the original object-centric graph to enhance object-level reasoning and improve relation prediction. Our framework is model-agnostic and can be integrated with any existing object-centric method. Experiments on the 3DSSG dataset with two competitive baselines show consistent improvements, highlighting the effectiveness of our edge-to-object reasoning paradigm.

Edge-Centric Relational Reasoning for 3D Scene Graph Prediction

TL;DR

The paper tackles 3D scene graph prediction by identifying the limitations of object-centric message passing and introducing LEO, a three-stage framework that transitions from edge-centric to object-centric reasoning. LEO uses a link-prediction module to softly weight edges, transforms the scene graph into a line graph to perform relation-level reasoning with a LineGNN, and then fuses the enriched relation features back into the original graph for final predictions. This edge-to-object approach is model-agnostic and yields consistent improvements on the 3DSSG benchmark when integrated with strong baselines like KISGP and 3DHetSGP, demonstrating the value of higher-order relational context. The results show gains in PredCls and SGCls tasks, indicating enhanced robustness and coherence of 3D scene graphs in indoor environments.

Abstract

3D scene graph prediction aims to abstract complex 3D environments into structured graphs consisting of objects and their pairwise relationships. Existing approaches typically adopt object-centric graph neural networks, where relation edge features are iteratively updated by aggregating messages from connected object nodes. However, this design inherently restricts relation representations to pairwise object context, making it difficult to capture high-order relational dependencies that are essential for accurate relation prediction. To address this limitation, we propose a Link-guided Edge-centric relational reasoning framework with Object-aware fusion, namely LEO, which enables progressive reasoning from relation-level context to object-level understanding. Specifically, LEO first predicts potential links between object pairs to suppress irrelevant edges, and then transforms the original scene graph into a line graph where each relation is treated as a node. A line graph neural network is applied to perform edge-centric relational reasoning to capture inter-relation context. The enriched relation features are subsequently integrated into the original object-centric graph to enhance object-level reasoning and improve relation prediction. Our framework is model-agnostic and can be integrated with any existing object-centric method. Experiments on the 3DSSG dataset with two competitive baselines show consistent improvements, highlighting the effectiveness of our edge-to-object reasoning paradigm.

Paper Structure

This paper contains 34 sections, 14 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Object-centric message passing vs. Edge-centric message passing.
  • Figure 2: Line Graph Transformation.
  • Figure 3: The overview of our LEO framework. It consists of three stages: (a) Link Prediction assigns soft link weights to object pairs in the original scene graph to modulate relation strengths for subsequent reasoning; (b) Edge-centric Relational Reasoning transforms the weighted original graph into a line graph and applies LineGNN to capture relation-level context and refine relation features; (c) Object-centric Relational Reasoning integrates the refined relations into the original graph for final object and predicate prediction.
  • Figure 4: Qualitative results of our model and baseline 3DHetSGP 3DHetSGP on ngcR@20. Red arrows indicate incorrectly predicted relationships. Green arrows indicate relationships that are missed or misclassified by 3DHetSGP but correctly predicted by ours.
  • Figure 5: Qualitative results of our model and baseline 3DHetSGP on 3DSSG dataset under ngcR@20. Red arrows indicate incorrectly predicted relationships. Green arrows indicate relationships that are missed or misclassified by 3DHetSGP but correctly predicted by ours.
  • ...and 1 more figures