Table of Contents
Fetching ...

GridFormer: Point-Grid Transformer for Surface Reconstruction

Shengtao Li, Ge Gao, Yudong Liu, Yu-Shen Liu, Ming Gu

TL;DR

GridFormer introduces a Point-Grid Transformer that treats a regular grid as a transfer point between space and the point cloud to learn an implicit occupancy field $o:\mathbf{R}^3\rightarrow[0,1]$. It employs a two-branch attention mechanism with local position encoding and skip connections to fuse grid and point features, plus a multi-resolution decoder and a boundary optimization strategy using margin binary cross-entropy to sharpen surfaces. The method achieves state-of-the-art or competitive results on ShapeNet object-level and Synthetic Rooms/ScanNet-v2 scene-level reconstructions while improving efficiency through grid-based feature processing. The approach demonstrates robustness to point density and noise and offers practical benefits for scalable, high-fidelity 3D surface reconstruction with available code.

Abstract

Implicit neural networks have emerged as a crucial technology in 3D surface reconstruction. To reconstruct continuous surfaces from discrete point clouds, encoding the input points into regular grid features (plane or volume) has been commonly employed in existing approaches. However, these methods typically use the grid as an index for uniformly scattering point features. Compared with the irregular point features, the regular grid features may sacrifice some reconstruction details but improve efficiency. To take full advantage of these two types of features, we introduce a novel and high-efficiency attention mechanism between the grid and point features named Point-Grid Transformer (GridFormer). This mechanism treats the grid as a transfer point connecting the space and point cloud. Our method maximizes the spatial expressiveness of grid features and maintains computational efficiency. Furthermore, optimizing predictions over the entire space could potentially result in blurred boundaries. To address this issue, we further propose a boundary optimization strategy incorporating margin binary cross-entropy loss and boundary sampling. This approach enables us to achieve a more precise representation of the object structure. Our experiments validate that our method is effective and outperforms the state-of-the-art approaches under widely used benchmarks by producing more precise geometry reconstructions. The code is available at https://github.com/list17/GridFormer.

GridFormer: Point-Grid Transformer for Surface Reconstruction

TL;DR

GridFormer introduces a Point-Grid Transformer that treats a regular grid as a transfer point between space and the point cloud to learn an implicit occupancy field . It employs a two-branch attention mechanism with local position encoding and skip connections to fuse grid and point features, plus a multi-resolution decoder and a boundary optimization strategy using margin binary cross-entropy to sharpen surfaces. The method achieves state-of-the-art or competitive results on ShapeNet object-level and Synthetic Rooms/ScanNet-v2 scene-level reconstructions while improving efficiency through grid-based feature processing. The approach demonstrates robustness to point density and noise and offers practical benefits for scalable, high-fidelity 3D surface reconstruction with available code.

Abstract

Implicit neural networks have emerged as a crucial technology in 3D surface reconstruction. To reconstruct continuous surfaces from discrete point clouds, encoding the input points into regular grid features (plane or volume) has been commonly employed in existing approaches. However, these methods typically use the grid as an index for uniformly scattering point features. Compared with the irregular point features, the regular grid features may sacrifice some reconstruction details but improve efficiency. To take full advantage of these two types of features, we introduce a novel and high-efficiency attention mechanism between the grid and point features named Point-Grid Transformer (GridFormer). This mechanism treats the grid as a transfer point connecting the space and point cloud. Our method maximizes the spatial expressiveness of grid features and maintains computational efficiency. Furthermore, optimizing predictions over the entire space could potentially result in blurred boundaries. To address this issue, we further propose a boundary optimization strategy incorporating margin binary cross-entropy loss and boundary sampling. This approach enables us to achieve a more precise representation of the object structure. Our experiments validate that our method is effective and outperforms the state-of-the-art approaches under widely used benchmarks by producing more precise geometry reconstructions. The code is available at https://github.com/list17/GridFormer.
Paper Structure (31 sections, 8 equations, 8 figures, 8 tables)

This paper contains 31 sections, 8 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Visualization of the complex scene reconstruction results on the Synthetic Rooms dataset Peng2020ECCV. Our method can produce high-fidelity reconstructions compared with the point-based method POCO Boulch_2022_CVPR and the grid-based method ALTO Wang2023CVPR.
  • Figure 2: Comparisons between our GridFormer and other methods. The colorized arrows in (a), (c), and (d) represent learnable weights for scattering point or grid features. (a) The point-based approach expresses the query point feature by aggregating the nearby point features with learnable weights. (b) The grid-based approach learns the grid features by uniformly scattering the point features. The decoder aggregates the grid features by the weights calculated by bilinear or trilinear interpolation. (c) The attention-based decoder in ALTO Wang2023CVPR makes the weights between the query and grid points learnable. (d) Our point-grid transformer learns the weights between the input and grid features. This enables our method to approximate (a) through grid points while maintaining high efficiency.
  • Figure 3: Overview of our method. (a) The architecture of GridFormer. (b) The 2D plane point-grid transformer layer in which the colorized arrows represent learnable weights. (c) The detailed structure of the point-grid attention mechanism for point features aggregation. 'Pos Enc' denotes position encoding.
  • Figure 4: Illustration of boundary optimization.
  • Figure 5: Object-level reconstruction results on the ShapeNet dataset. All the methods are trained and tested on 3000 noisy points.
  • ...and 3 more figures