ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding
Quang P. M. Pham, Khoi T. N. Nguyen, Lan C. Ngo, Truong Do, Truong Son Hy
TL;DR
The paper tackles robust 3D scene graph generation from point clouds by preserving geometric symmetry through an Equivariant Graph Neural Network (ESGNN). ESGNN combines FAN-GCL attention with EGCL-based equivariant message passing to maintain $E(n)$-equivariance, enabling stable representations under rotations and translations with fewer layers and lower computational costs. Using PointNet-based segment encoders and a neighbor graph, ESGNN achieves faster convergence and improved relation prediction on 3DSSG/3RScan benchmarks, including unseen triplets, while remaining compatible with existing frameworks. This approach holds practical significance for real-time 3D scene understanding in robotics and computer vision, facilitating robust perception with efficient resource use and potential for image-guided extensions.
Abstract
Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, multi-view 3D data. This work, to the best of our knowledge, is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding. Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence. ESGNN demands low computational resources and is easy to implement from available frameworks, paving the way for real-time applications such as robotics and computer vision.
