Table of Contents
Fetching ...

Geometric Point Attention Transformer for 3D Shape Reassembly

Jiahan Li, Chaoran Cheng, Jianzhu Ma, Ge Liu

TL;DR

The paper tackles 3D shape assembly by predicting absolute $SE(3)$ poses for multiple parts while capturing both global context and local geometric interactions. It introduces the Geometric Point Attention Transformer (GPAT) with a geometric recycling mechanism that iteratively refines pose predictions, maintaining equivariance to global rigid transformations. Empirical results on PartNet (semantic assembly) and Breaking Bad (geometric assembly) show GPAT achieving state-of-the-art or competitive performance, with ablations confirming the critical roles of the geometric attention components and recycling. The approach provides a robust, scalable backbone for 6-DoF pose estimation in complex 3D reassembly tasks and offers guidance for future research in 3D reconstruction and assembly pipelines.

Abstract

Shape assembly, which aims to reassemble separate parts into a complete object, has gained significant interest in recent years. Existing methods primarily rely on networks to predict the poses of individual parts, but often fail to effectively capture the geometric interactions between the parts and their poses. In this paper, we present the Geometric Point Attention Transformer (GPAT), a network specifically designed to address the challenges of reasoning about geometric relationships. In the geometric point attention module, we integrate both global shape information and local pairwise geometric features, along with poses represented as rotation and translation vectors for each part. To enable iterative updates and dynamic reasoning, we introduce a geometric recycling scheme, where each prediction is fed into the next iteration for refinement. We evaluate our model on both the semantic and geometric assembly tasks, showing that it outperforms previous methods in absolute pose estimation, achieving accurate pose predictions and high alignment accuracy.

Geometric Point Attention Transformer for 3D Shape Reassembly

TL;DR

The paper tackles 3D shape assembly by predicting absolute poses for multiple parts while capturing both global context and local geometric interactions. It introduces the Geometric Point Attention Transformer (GPAT) with a geometric recycling mechanism that iteratively refines pose predictions, maintaining equivariance to global rigid transformations. Empirical results on PartNet (semantic assembly) and Breaking Bad (geometric assembly) show GPAT achieving state-of-the-art or competitive performance, with ablations confirming the critical roles of the geometric attention components and recycling. The approach provides a robust, scalable backbone for 6-DoF pose estimation in complex 3D reassembly tasks and offers guidance for future research in 3D reconstruction and assembly pipelines.

Abstract

Shape assembly, which aims to reassemble separate parts into a complete object, has gained significant interest in recent years. Existing methods primarily rely on networks to predict the poses of individual parts, but often fail to effectively capture the geometric interactions between the parts and their poses. In this paper, we present the Geometric Point Attention Transformer (GPAT), a network specifically designed to address the challenges of reasoning about geometric relationships. In the geometric point attention module, we integrate both global shape information and local pairwise geometric features, along with poses represented as rotation and translation vectors for each part. To enable iterative updates and dynamic reasoning, we introduce a geometric recycling scheme, where each prediction is fed into the next iteration for refinement. We evaluate our model on both the semantic and geometric assembly tasks, showing that it outperforms previous methods in absolute pose estimation, achieving accurate pose predictions and high alignment accuracy.

Paper Structure

This paper contains 27 sections, 31 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of our model architecture. Given the point clouds of each part, we first use a feature extractor to generate part features and pairwise features. These features, along with the initial poses, are updated in a stack of geometric point attention modules. The predicted poses and positions are recycled for the next round of predictions in the geometric recycling module.
  • Figure 2: The computation graph of the geometric point attention module. Different input features are fused by the final attention block, with information across parts, Paris, and poses.
  • Figure 3: Qualitative results of part assembly using predicted poses from GPAT for semantic assembly.
  • Figure 4: Qualitative comparison between GPAT and other baselines for geometric assembly. GPAT outperforms across different object shapes and numbers of fragments.