Table of Contents
Fetching ...

Learning Inverse Laplacian Pyramid for Progressive Depth Completion

Kun Wang, Zhiqiang Yan, Junkai Fan, Jun Li, Jian Yang

TL;DR

LP-Net reframes depth completion as a progressive, multi-scale problem using Laplacian Pyramid decomposition, enabling global scene understanding before refining local details. It introduces two novel modules: Multi-path Feature Pyramid (MFP) for enriched global context and Selective Depth Filtering (SDF) for adaptive smoothness and sharpness filtering via deformable depth kernels. Across KITTI DC, NYUv2, and TOFDC, LP-Net achieves state-of-the-art or near-state-of-the-art accuracy while markedly improving computational efficiency, including fastest inference on the KITTI DC benchmark. This approach offers a practical, scalable solution for fast, accurate depth densification in real-world RGB-D sensing pipelines.

Abstract

Depth completion endeavors to reconstruct a dense depth map from sparse depth measurements, leveraging the information provided by a corresponding color image. Existing approaches mostly hinge on single-scale propagation strategies that iteratively ameliorate initial coarse depth estimates through pixel-level message passing. Despite their commendable outcomes, these techniques are frequently hampered by computational inefficiencies and a limited grasp of scene context. To circumvent these challenges, we introduce LP-Net, an innovative framework that implements a multi-scale, progressive prediction paradigm based on Laplacian Pyramid decomposition. Diverging from propagation-based approaches, LP-Net initiates with a rudimentary, low-resolution depth prediction to encapsulate the global scene context, subsequently refining this through successive upsampling and the reinstatement of high-frequency details at incremental scales. We have developed two novel modules to bolster this strategy: 1) the Multi-path Feature Pyramid module, which segregates feature maps into discrete pathways, employing multi-scale transformations to amalgamate comprehensive spatial information, and 2) the Selective Depth Filtering module, which dynamically learns to apply both smoothness and sharpness filters to judiciously mitigate noise while accentuating intricate details. By integrating these advancements, LP-Net not only secures state-of-the-art (SOTA) performance across both outdoor and indoor benchmarks such as KITTI, NYUv2, and TOFDC, but also demonstrates superior computational efficiency. At the time of submission, LP-Net ranks 1st among all peer-reviewed methods on the official KITTI leaderboard.

Learning Inverse Laplacian Pyramid for Progressive Depth Completion

TL;DR

LP-Net reframes depth completion as a progressive, multi-scale problem using Laplacian Pyramid decomposition, enabling global scene understanding before refining local details. It introduces two novel modules: Multi-path Feature Pyramid (MFP) for enriched global context and Selective Depth Filtering (SDF) for adaptive smoothness and sharpness filtering via deformable depth kernels. Across KITTI DC, NYUv2, and TOFDC, LP-Net achieves state-of-the-art or near-state-of-the-art accuracy while markedly improving computational efficiency, including fastest inference on the KITTI DC benchmark. This approach offers a practical, scalable solution for fast, accurate depth densification in real-world RGB-D sensing pipelines.

Abstract

Depth completion endeavors to reconstruct a dense depth map from sparse depth measurements, leveraging the information provided by a corresponding color image. Existing approaches mostly hinge on single-scale propagation strategies that iteratively ameliorate initial coarse depth estimates through pixel-level message passing. Despite their commendable outcomes, these techniques are frequently hampered by computational inefficiencies and a limited grasp of scene context. To circumvent these challenges, we introduce LP-Net, an innovative framework that implements a multi-scale, progressive prediction paradigm based on Laplacian Pyramid decomposition. Diverging from propagation-based approaches, LP-Net initiates with a rudimentary, low-resolution depth prediction to encapsulate the global scene context, subsequently refining this through successive upsampling and the reinstatement of high-frequency details at incremental scales. We have developed two novel modules to bolster this strategy: 1) the Multi-path Feature Pyramid module, which segregates feature maps into discrete pathways, employing multi-scale transformations to amalgamate comprehensive spatial information, and 2) the Selective Depth Filtering module, which dynamically learns to apply both smoothness and sharpness filters to judiciously mitigate noise while accentuating intricate details. By integrating these advancements, LP-Net not only secures state-of-the-art (SOTA) performance across both outdoor and indoor benchmarks such as KITTI, NYUv2, and TOFDC, but also demonstrates superior computational efficiency. At the time of submission, LP-Net ranks 1st among all peer-reviewed methods on the official KITTI leaderboard.

Paper Structure

This paper contains 30 sections, 11 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Visual comparison with recent state-of-the-art (SOTA) methods on prediction accuracy and computational efficiency. MAE metrics are sourced from the KITTI online leaderboard, while inference time and GPU memory usage are assessed on $1216\times 256$ images using a single RTX 4090 GPU. Our proposed LP-Net demonstrates superior performance over previous SOTA methods in both accuracy and efficiency.
  • Figure 2: Visual comparison between existing single-scale, propagation-based approaches and our multi-scale, Laplacian Pyramid-based prediction scheme. (a) Propagation-based methods iteratively refine an initial coarse depth prediction through recurrent convolution, aggregating depth information from neighboring pixels. (b) Our proposed prediction scheme progressively upsamples the initial low-resolution prediction and recovers high-frequency details at each scale with a novel selective filtering mechanism, facilitating both accurate and efficient depth completion.
  • Figure 3: Overall Framework of LP-Net. MFP, RH and SDF stand for the Multi-path Feature Pyramid module, Regression Head and Selective Depth Filtering module, respectively. $S$ represents the input sparse depth, while $\hat{S}^{(1)}\sim \hat{S}^{(4)}$ are its progressively lower-resolution versions, obtained through a weighted pooling operation. $F_{d}^{0}\sim F_{d}^{4}$ indicate the decoder feature maps. The prediction of the final depth map $\hat{D}$ is structured into five progressive steps, beginning with a direct regression and confidence-based fusion with $\hat{S}^{(4)}$ to produce the low-frequency residual $\hat{D}^{(4)}$. Subsequently, $\hat{D}^{(4)}$ undergoes iterative upsampling, fusion with the corresponding sparse measurements, and refinement via the SDF module to yield more accurate, higher-resolution depth maps.
  • Figure 4: Evolution of depth completion results. We illustrate the progression of our depth completion scheme by showcasing the intermediate results on the NYUv2 dataset. The depth predictions $\hat{D}^{(4)}\sim \hat{D}^{(1)}$ have been upsampled to full resolution for enhanced visualization.
  • Figure 5: Illustration of the Multi-path Feature Pyramid (MFP) Module. This module segments the feature map $F_e$ into $p$ pathways, which are then transformed through multiple convolutional layers with a stride of 2. These pathways are subsequently upsampled and fused to integrate global information across different visual fields, yielding $\hat{F}_e$.
  • ...and 7 more figures