Table of Contents
Fetching ...

MBPU: A Plug-and-Play State Space Model for Point Cloud Upsamping with Fast Point Rendering

Jiayi Song, Weidong Yang, Zhijun Li, Wen-Ming Chen, Ben Fei

TL;DR

This paper introduces a network named MBPU built on top of the Mamba architecture, which performs well in long sequence modeling, especially for large-scale point cloud upsampling, and achieves fast convergence speed.

Abstract

The task of point cloud upsampling (PCU) is to generate dense and uniform point clouds from sparse input captured by 3D sensors like LiDAR, holding potential applications in real yet is still a challenging task. Existing deep learning-based methods have shown significant achievements in this field. However, they still face limitations in effectively handling long sequences and addressing the issue of shrinkage artifacts around the surface of the point cloud. Inspired by the newly proposed Mamba, in this paper, we introduce a network named MBPU built on top of the Mamba architecture, which performs well in long sequence modeling, especially for large-scale point cloud upsampling, and achieves fast convergence speed. Moreover, MBPU is an arbitrary-scale upsampling framework as the predictor of point distance in the point refinement phase. At the same time, we simultaneously predict the 3D position shift and 1D point-to-point distance as regression quantities to constrain the global features while ensuring the accuracy of local details. We also introduce a fast differentiable renderer to further enhance the fidelity of the upsampled point cloud and reduce artifacts. It is noted that, by the merits of our fast point rendering, MBPU yields high-quality upsampled point clouds by effectively eliminating surface noise. Extensive experiments have demonstrated that our MBPU outperforms other off-the-shelf methods in terms of point cloud upsampling, especially for large-scale point clouds.

MBPU: A Plug-and-Play State Space Model for Point Cloud Upsamping with Fast Point Rendering

TL;DR

This paper introduces a network named MBPU built on top of the Mamba architecture, which performs well in long sequence modeling, especially for large-scale point cloud upsampling, and achieves fast convergence speed.

Abstract

The task of point cloud upsampling (PCU) is to generate dense and uniform point clouds from sparse input captured by 3D sensors like LiDAR, holding potential applications in real yet is still a challenging task. Existing deep learning-based methods have shown significant achievements in this field. However, they still face limitations in effectively handling long sequences and addressing the issue of shrinkage artifacts around the surface of the point cloud. Inspired by the newly proposed Mamba, in this paper, we introduce a network named MBPU built on top of the Mamba architecture, which performs well in long sequence modeling, especially for large-scale point cloud upsampling, and achieves fast convergence speed. Moreover, MBPU is an arbitrary-scale upsampling framework as the predictor of point distance in the point refinement phase. At the same time, we simultaneously predict the 3D position shift and 1D point-to-point distance as regression quantities to constrain the global features while ensuring the accuracy of local details. We also introduce a fast differentiable renderer to further enhance the fidelity of the upsampled point cloud and reduce artifacts. It is noted that, by the merits of our fast point rendering, MBPU yields high-quality upsampled point clouds by effectively eliminating surface noise. Extensive experiments have demonstrated that our MBPU outperforms other off-the-shelf methods in terms of point cloud upsampling, especially for large-scale point clouds.

Paper Structure

This paper contains 23 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Pipeline of our framework. Given a low-resolution input point cloud, we first perform midpoint interpolation and FPS on it to obtain point clouds with the desired number of points. Then the interpolated point cloud is fed into our MBPU network, for gradient descent to progressively refine the interpolated point cloud.
  • Figure 2: (a) The overall architecture of our MBPU, mainly consists of two modules: a feature extractor and a distance regressor. In the feature extractor, We utilize three mixer modules in each dense block to extract local features and a transition layer to reduce channel. In the distance regressor, we estimate 3D position shift and P2P distance through two branches. (b) Structure of Mamba block liang2024pointmamba, which consists of layer normalization (LN), Selective SSM, depth-wise convolution (DW), and MLPs. (c) Pipeline of our devised differentiable render module, which renders depth images of temporary upsampled point clouds and ground truth. The view loss between these rendered images will be back-propagated to update the parameters of the network. The small triangles in the first cube represent the camera poses.
  • Figure 3: $4\times$ visualization results on PU-GAN dataset. Our method performs less outliers and more fine-grained details.
  • Figure 4: Visualization results on PU1K dataset with $4\times$ upsampling rate. Our method demonstrates more accurate details and global structure.
  • Figure 5: $4\times$ result on KITTI dataset. Compared with others, our method has less noise and more precisely reconstructs people and vehicle.
  • ...and 1 more figures