Table of Contents
Fetching ...

FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation

Fei Wang, Jun Cheng

TL;DR

This work tackles the trade-off between accuracy and speed in self-supervised monocular depth estimation. It introduces SmallDepth, a sparsity-based lightweight DepthNet, and enhances training with Equivalent Transformation Module (ETM) and Pyramid Loss to improve context perception and robustness without increasing inference cost. A Function Approximation Loss (APX) transfers knowledge from a pretrained HQDecv2 to SmallDepth, addressing grid artifacts and further boosting accuracy. The combined approach achieves state-of-the-art results on KITTI with over 500 FPS and approximately 2 million parameters, demonstrating strong practical impact for real-time 3D understanding in robotics and autonomous systems.

Abstract

Most existing methods often rely on complex models to predict scene depth with high accuracy, resulting in slow inference that is not conducive to deployment. To better balance precision and speed, we first designed SmallDepth based on sparsity. Second, to enhance the feature representation ability of SmallDepth during training under the condition of equal complexity during inference, we propose an equivalent transformation module(ETM). Third, to improve the ability of each layer in the case of a fixed SmallDepth to perceive different context information and improve the robustness of SmallDepth to the left-right direction and illumination changes, we propose pyramid loss. Fourth, to further improve the accuracy of SmallDepth, we utilized the proposed function approximation loss (APX) to transfer knowledge in the pretrained HQDecv2, obtained by optimizing the previous HQDec to address grid artifacts in some regions, to SmallDepth. Extensive experiments demonstrate that each proposed component improves the precision of SmallDepth without changing the complexity of SmallDepth during inference, and the developed approach achieves state-of-the-art results on KITTI at an inference speed of more than 500 frames per second and with approximately 2 M parameters. The code and models will be publicly available at https://github.com/fwucas/FA-Depth.

FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation

TL;DR

This work tackles the trade-off between accuracy and speed in self-supervised monocular depth estimation. It introduces SmallDepth, a sparsity-based lightweight DepthNet, and enhances training with Equivalent Transformation Module (ETM) and Pyramid Loss to improve context perception and robustness without increasing inference cost. A Function Approximation Loss (APX) transfers knowledge from a pretrained HQDecv2 to SmallDepth, addressing grid artifacts and further boosting accuracy. The combined approach achieves state-of-the-art results on KITTI with over 500 FPS and approximately 2 million parameters, demonstrating strong practical impact for real-time 3D understanding in robotics and autonomous systems.

Abstract

Most existing methods often rely on complex models to predict scene depth with high accuracy, resulting in slow inference that is not conducive to deployment. To better balance precision and speed, we first designed SmallDepth based on sparsity. Second, to enhance the feature representation ability of SmallDepth during training under the condition of equal complexity during inference, we propose an equivalent transformation module(ETM). Third, to improve the ability of each layer in the case of a fixed SmallDepth to perceive different context information and improve the robustness of SmallDepth to the left-right direction and illumination changes, we propose pyramid loss. Fourth, to further improve the accuracy of SmallDepth, we utilized the proposed function approximation loss (APX) to transfer knowledge in the pretrained HQDecv2, obtained by optimizing the previous HQDec to address grid artifacts in some regions, to SmallDepth. Extensive experiments demonstrate that each proposed component improves the precision of SmallDepth without changing the complexity of SmallDepth during inference, and the developed approach achieves state-of-the-art results on KITTI at an inference speed of more than 500 frames per second and with approximately 2 M parameters. The code and models will be publicly available at https://github.com/fwucas/FA-Depth.
Paper Structure (26 sections, 46 equations, 7 figures, 11 tables)

This paper contains 26 sections, 46 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Overview of the training architecture.
  • Figure 2: AdaCoeff and DAdaNRSUv2 modules.
  • Figure 3: Global feature maps calculation module.
  • Figure 4: The mask on KITTI and DDAD dataset.
  • Figure 5: Qualitative comparison ($192\times 640$ ) on KITTI Dataset. Ours-A:SmallDepth, Ours-B: HQDecv2.
  • ...and 2 more figures