Geometric Point Attention Transformer for 3D Shape Reassembly
Jiahan Li, Chaoran Cheng, Jianzhu Ma, Ge Liu
TL;DR
The paper tackles 3D shape assembly by predicting absolute $SE(3)$ poses for multiple parts while capturing both global context and local geometric interactions. It introduces the Geometric Point Attention Transformer (GPAT) with a geometric recycling mechanism that iteratively refines pose predictions, maintaining equivariance to global rigid transformations. Empirical results on PartNet (semantic assembly) and Breaking Bad (geometric assembly) show GPAT achieving state-of-the-art or competitive performance, with ablations confirming the critical roles of the geometric attention components and recycling. The approach provides a robust, scalable backbone for 6-DoF pose estimation in complex 3D reassembly tasks and offers guidance for future research in 3D reconstruction and assembly pipelines.
Abstract
Shape assembly, which aims to reassemble separate parts into a complete object, has gained significant interest in recent years. Existing methods primarily rely on networks to predict the poses of individual parts, but often fail to effectively capture the geometric interactions between the parts and their poses. In this paper, we present the Geometric Point Attention Transformer (GPAT), a network specifically designed to address the challenges of reasoning about geometric relationships. In the geometric point attention module, we integrate both global shape information and local pairwise geometric features, along with poses represented as rotation and translation vectors for each part. To enable iterative updates and dynamic reasoning, we introduce a geometric recycling scheme, where each prediction is fed into the next iteration for refinement. We evaluate our model on both the semantic and geometric assembly tasks, showing that it outperforms previous methods in absolute pose estimation, achieving accurate pose predictions and high alignment accuracy.
