Table of Contents
Fetching ...

Scale Propagation Network for Generalizable Depth Completion

Haotian Wang, Meng Yang, Xinhu Zheng, Gang Hua

TL;DR

A novel scale propagation normalization method to propagate scales from input to output, and simultaneously preserve the normalization operator for easy convergence is proposed, which consistently achieves the best accuracy with faster speed and lower memory when compared to state-of-the-art methods.

Abstract

Depth completion, inferring dense depth maps from sparse measurements, is crucial for robust 3D perception. Although deep learning based methods have made tremendous progress in this problem, these models cannot generalize well across different scenes that are unobserved in training, posing a fundamental limitation that yet to be overcome. A careful analysis of existing deep neural network architectures for depth completion, which are largely borrowing from successful backbones for image analysis tasks, reveals that a key design bottleneck actually resides in the conventional normalization layers. These normalization layers are designed, on one hand, to make training more stable, on the other hand, to build more visual invariance across scene scales. However, in depth completion, the scale is actually what we want to robustly estimate in order to better generalize to unseen scenes. To mitigate, we propose a novel scale propagation normalization (SP-Norm) method to propagate scales from input to output, and simultaneously preserve the normalization operator for easy convergence. More specifically, we rescale the input using learned features of a single-layer perceptron from the normalized input, rather than directly normalizing the input as conventional normalization layers. We then develop a new network architecture based on SP-Norm and the ConvNeXt V2 backbone. We explore the composition of various basic blocks and architectures to achieve superior performance and efficient inference for generalizable depth completion. Extensive experiments are conducted on six unseen datasets with various types of sparse depth maps, i.e., randomly sampled 0.1\%/1\%/10\% valid pixels, 4/8/16/32/64-line LiDAR points, and holes from Structured-Light. Our model consistently achieves the best accuracy with faster speed and lower memory when compared to state-of-the-art methods.

Scale Propagation Network for Generalizable Depth Completion

TL;DR

A novel scale propagation normalization method to propagate scales from input to output, and simultaneously preserve the normalization operator for easy convergence is proposed, which consistently achieves the best accuracy with faster speed and lower memory when compared to state-of-the-art methods.

Abstract

Depth completion, inferring dense depth maps from sparse measurements, is crucial for robust 3D perception. Although deep learning based methods have made tremendous progress in this problem, these models cannot generalize well across different scenes that are unobserved in training, posing a fundamental limitation that yet to be overcome. A careful analysis of existing deep neural network architectures for depth completion, which are largely borrowing from successful backbones for image analysis tasks, reveals that a key design bottleneck actually resides in the conventional normalization layers. These normalization layers are designed, on one hand, to make training more stable, on the other hand, to build more visual invariance across scene scales. However, in depth completion, the scale is actually what we want to robustly estimate in order to better generalize to unseen scenes. To mitigate, we propose a novel scale propagation normalization (SP-Norm) method to propagate scales from input to output, and simultaneously preserve the normalization operator for easy convergence. More specifically, we rescale the input using learned features of a single-layer perceptron from the normalized input, rather than directly normalizing the input as conventional normalization layers. We then develop a new network architecture based on SP-Norm and the ConvNeXt V2 backbone. We explore the composition of various basic blocks and architectures to achieve superior performance and efficient inference for generalizable depth completion. Extensive experiments are conducted on six unseen datasets with various types of sparse depth maps, i.e., randomly sampled 0.1\%/1\%/10\% valid pixels, 4/8/16/32/64-line LiDAR points, and holes from Structured-Light. Our model consistently achieves the best accuracy with faster speed and lower memory when compared to state-of-the-art methods.

Paper Structure

This paper contains 42 sections, 17 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Examples of generalizable depth completion across different scenes by our model and a recent SOTA baseline zhang2023completionformer. Our model always infers accurate depth values and thereby well maintains the structure of objects in 3D view. In addition, our model has faster speed (126.6 vs 11.1 image/s) on a 3090 GPU.
  • Figure 2: Illustration of the SP-property. The ambiguous scales of output depth values $z$ or $sz$ can be determined by input sparse points $d$ or $sd$, respectively.
  • Figure 3: Different normalization strategies. (a) conventional normalization layers (e.g., BN, IN, and LN), (b) non-normalization techniques (e.g., ReZero, SkipInit, and Fixup), and (c) our SP-Norm.
  • Figure 4: Illustration of conventional normalization and our SP-Norm. In conventional normalization, multiple inputs $d^1_i$ and $d^2_i$ with different scales are mapped to one output with the same scale. The output scale solely depends on learnable affine factors $\alpha_i, \beta_i$. In contrast, our SP-Norm can preserve the varying scales of the inputs to the outputs. The output scales jointly depend on both the scales of the inputs $d^1_i, d^2_i$ and learnable parameters $\omega_{ij}, b_i$ of SLP.
  • Figure 5: Three variants of our SP-Norm. (a) removing the normalization operator, (b) replacing the SLP with the affine factors, and (c) replacing the multiplier with an adder.
  • ...and 6 more figures