Table of Contents
Fetching ...

FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation

Tianyu Zhang, Guocheng Qian, Jin Xie, Jian Yang

TL;DR

This work presents FastPCI that introduces Pyramid Convolution-Transformer architecture for point cloud frame interpolation, which improves the local and long-range feature learning, while the pyramid network offers multilevel features and reduces the computation.

Abstract

Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure. Prevailing techniques often rely on pre-trained motion estimators or intensive testing-time optimization, resulting in compromised interpolation accuracy or prolonged inference. This work presents FastPCI that introduces Pyramid Convolution-Transformer architecture for point cloud frame interpolation. Our hybrid Convolution-Transformer improves the local and long-range feature learning, while the pyramid network offers multilevel features and reduces the computation. In addition, FastPCI proposes a unique Dual-Direction Motion-Structure block for more accurate scene flow estimation. Our design is motivated by two facts: (1) accurate scene flow preserves 3D structure, and (2) point cloud at the previous timestep should be reconstructable using reverse motion from future timestep. Extensive experiments show that FastPCI significantly outperforms the state-of-the-art PointINet and NeuralPCI with notable gains (e.g. 26.6% and 18.3% reduction in Chamfer Distance in KITTI), while being more than 10x and 600x faster, respectively. Code is available at https://github.com/genuszty/FastPCI

FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation

TL;DR

This work presents FastPCI that introduces Pyramid Convolution-Transformer architecture for point cloud frame interpolation, which improves the local and long-range feature learning, while the pyramid network offers multilevel features and reduces the computation.

Abstract

Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure. Prevailing techniques often rely on pre-trained motion estimators or intensive testing-time optimization, resulting in compromised interpolation accuracy or prolonged inference. This work presents FastPCI that introduces Pyramid Convolution-Transformer architecture for point cloud frame interpolation. Our hybrid Convolution-Transformer improves the local and long-range feature learning, while the pyramid network offers multilevel features and reduces the computation. In addition, FastPCI proposes a unique Dual-Direction Motion-Structure block for more accurate scene flow estimation. Our design is motivated by two facts: (1) accurate scene flow preserves 3D structure, and (2) point cloud at the previous timestep should be reconstructable using reverse motion from future timestep. Extensive experiments show that FastPCI significantly outperforms the state-of-the-art PointINet and NeuralPCI with notable gains (e.g. 26.6% and 18.3% reduction in Chamfer Distance in KITTI), while being more than 10x and 600x faster, respectively. Code is available at https://github.com/genuszty/FastPCI

Paper Structure

This paper contains 17 sections, 9 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 2: (left) : frame $t-1$ ; (middle) : frame $t$ ; (right) : the overlapping original frame $t-1$ and the frame $t-1$ estimated from frame $t$. FastPCI produces structure-aware motion and performs dual-direction motion estimation motivated by two facts: (1) structure consistency: an accurate motion preserves structure of objects, e.g. the car in red box; (2) cycle consistency: point cloud at frame $t$ (right) is predicted from frame $t-1$ (left) by the estimated motion, meanwhile frame $t-1$ can be reconstructed from frame $t$ (middle) by applying the reverse estimated motion (right).
  • Figure 3: Overview of FastPCI pipeline. Given two input frames $PC_0\in\mathbb{R}^{N \times 3}$ and $PC_1\in\mathbb{R}^{N \times 3}$, FastPCI estimates both motion and structure using a Pyramid Convolution-Transformer network. The estimated motion is used to warp the input frame to produce interpolated frames. RefineNet further refines the interpolated frames, and outputs the final frames from the fused forward and backward estimates.
  • Figure 4: The illustration of Motion-Structure Transformer. Where $\otimes$ and $\ominus$ denote matrix multiplication and element-wise subtraction, respectively. We highlight our Motion-Structure Transformer takes a bidirectional point features input, and perform a Dual-Direction Cross-Attention across forward and backward features. The structure and motion features are closely related to each other to learn a structure-aware motion.
  • Figure 5: Qualitative comparisons with the state-of-the-art on KITTI odometry, Argoverse 2 sensor, and Nuscenes dataset. Columns (a)-(c) represent the results on three datasets, respectively. Each row represents a different method. Our FastPCI ($3^{rd}$ row) yields the best qualitative results compared to the state-of-the-art PointINet ($1^{st}$ row) and NeuralPCI ($2^{nd}$ row).