Table of Contents
Fetching ...

GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

Jinpeng Yu, Binbin Huang, Yuxuan Zhang, Huaxia Li, Xu Tang, Shenghua Gao

TL;DR

GeoFormer tackles the challenge of completing partial point clouds by enhancing global geometric reasoning and local detail reconstruction. It introduces canonical coordinate maps (CCMs) rendered on tri-planes and aligns CCM-based 2D features with 3D point features via a transformer-based fusion to produce a robust global representation, followed by a multi-scale inception-inspired upsampler that refines geometry through cross-attention with prior predictions. A sensitive-aware loss using $L_{arc-CD}(x,y) = \operatorname{arcosh}(1 + L_{CD}(x,y))$ improves generalization and resilience to outliers. Across PCN, ShapeNet-55/34, and KITTI, GeoFormer achieves state-of-the-art results, demonstrating strong performance on both synthetic and real-world data, with a released implementation to support reproducibility.

Abstract

Point cloud completion aims to recover accurate global geometry and preserve fine-grained local details from partial point clouds. Conventional methods typically predict unseen points directly from 3D point cloud coordinates or use self-projected multi-view depth maps to ease this task. However, these gray-scale depth maps cannot reach multi-view consistency, consequently restricting the performance. In this paper, we introduce a GeoFormer that simultaneously enhances the global geometric structure of the points and improves the local details. Specifically, we design a CCM Feature Enhanced Point Generator to integrate image features from multi-view consistent canonical coordinate maps (CCMs) and align them with pure point features, thereby enhancing the global geometry feature. Additionally, we employ the Multi-scale Geometry-aware Upsampler module to progressively enhance local details. This is achieved through cross attention between the multi-scale features extracted from the partial input and the features derived from previously estimated points. Extensive experiments on the PCN, ShapeNet-55/34, and KITTI benchmarks demonstrate that our GeoFormer outperforms recent methods, achieving the state-of-the-art performance. Our code is available at \href{https://github.com/Jinpeng-Yu/GeoFormer}{https://github.com/Jinpeng-Yu/GeoFormer}.

GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

TL;DR

GeoFormer tackles the challenge of completing partial point clouds by enhancing global geometric reasoning and local detail reconstruction. It introduces canonical coordinate maps (CCMs) rendered on tri-planes and aligns CCM-based 2D features with 3D point features via a transformer-based fusion to produce a robust global representation, followed by a multi-scale inception-inspired upsampler that refines geometry through cross-attention with prior predictions. A sensitive-aware loss using improves generalization and resilience to outliers. Across PCN, ShapeNet-55/34, and KITTI, GeoFormer achieves state-of-the-art results, demonstrating strong performance on both synthetic and real-world data, with a released implementation to support reproducibility.

Abstract

Point cloud completion aims to recover accurate global geometry and preserve fine-grained local details from partial point clouds. Conventional methods typically predict unseen points directly from 3D point cloud coordinates or use self-projected multi-view depth maps to ease this task. However, these gray-scale depth maps cannot reach multi-view consistency, consequently restricting the performance. In this paper, we introduce a GeoFormer that simultaneously enhances the global geometric structure of the points and improves the local details. Specifically, we design a CCM Feature Enhanced Point Generator to integrate image features from multi-view consistent canonical coordinate maps (CCMs) and align them with pure point features, thereby enhancing the global geometry feature. Additionally, we employ the Multi-scale Geometry-aware Upsampler module to progressively enhance local details. This is achieved through cross attention between the multi-scale features extracted from the partial input and the features derived from previously estimated points. Extensive experiments on the PCN, ShapeNet-55/34, and KITTI benchmarks demonstrate that our GeoFormer outperforms recent methods, achieving the state-of-the-art performance. Our code is available at \href{https://github.com/Jinpeng-Yu/GeoFormer}{https://github.com/Jinpeng-Yu/GeoFormer}.
Paper Structure (24 sections, 9 equations, 11 figures, 7 tables)

This paper contains 24 sections, 9 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Illustration of the geometry-consistent tri-plane projection in our GeoFormer. We visualize the details of canonical coordinate maps (CCM) obtained from three orthogonal views and the color of the point represents its normalized coordinate. The highlighted area clearly shows that the three-channel CCM itself contains rich geometric information and ensures multi-view geometric consistency.
  • Figure 2: An overview of our pipeline. Given the incomplete point cloud $\mathcal{P}$, we obtain the coarse complete prediction $\CMcal{P}_0$ and extract the global geometric feature $\CMcal{F}$ by utilizing the CCM feature enhanced point generator. In the coarse to fine generation stage, we utilize the multi-scale geometry-aware upsampler to learn coordinate offsets based on $\CMcal{P}$,$\CMcal{F}$ and previous estimated points $\CMcal{P}_i$, and further scatter them into specific 3D coordinates to reconstruct the accurate and detailed complete result $\CMcal{P}_2$.
  • Figure 3: The detailed structure of the CCM feature enhanced point generator. We first convert partial point cloud input $\CMcal{P}$ into the canonical coordinate space and extract the corresponding projection maps according to the views $\CMcal{V}$. Then, we align the 3D point features and the 2D map features through attention mechanism, and obtain the global features $\CMcal{F}$ after some processing. Finally, we use a 3D coordinate decoder to predict the coarse sparse but complete point cloud $\CMcal{P}_0$.
  • Figure 4: The detailed structure of the Decoder. We input the main features $\CMcal{F}_i$ into the N networks of attention architecture to get enhanced features, and then we use the shared MLP network to predict 3D coordinates.
  • Figure 5: The detailed structure of the Multi-scale Geometry-aware Upsampler. We design a multi-scale point feature extractor with inception architecture to get local point features $\CMcal{F}_p^{\prime}$ from partial input $\CMcal{P}$. Then, it is fused with the previous global feature $\CMcal{F}$ and prediction result $\CMcal{P}_i$ to obtain $\CMcal{F}_{p_i}$. Finally, we utilize the decoder to predict the point offset $\Delta$ and obtain the point cloud $\CMcal{P}_{i+1}$. ($^*$CD Emb. is calculated between $\CMcal{P}$ and $\CMcal{P}_i$)
  • ...and 6 more figures