Table of Contents
Fetching ...

PuriLight: A Lightweight Shuffle and Purification Framework for Monocular Depth Estimation

Yujie Chen, Li Zhang, Xiaomeng Chu, Tian Zhang

TL;DR

PuriLight tackles the challenge of achieving high-quality monocular depth estimation on edge devices by designing a lightweight encoder that combines Shuffle-Dilation Convolution, Rotation-Adaptive Kernel Attention, and Deep Frequency Signal Purification to jointly capture local detail and global structure. It reduces global feature processing cost via a frequency-domain purification stage and employs a simple, efficient decoder with a self-supervised training regime using photometric reprojection and edge-aware smoothness losses. On KITTI Eigen split, PuriLight delivers state-of-the-art results among lightweight methods with only 2.7M parameters, and cross-dataset Make3D results suggest strong generalization, validating its practicality for real-world, resource-constrained deployments. The approach demonstrates significant efficiency gains without sacrificing depth accuracy, making it well-suited for edge devices and robotics applications; in particular, DFSP reduces global-processing complexity from $O(hN^2+Nd)$ to $O(NC)$ and from $O(N^2d)$ to $O(N(C+\log N))$ while preserving essential structure. Overall, PuriLight offers a practical pathway to high-fidelity monocular depth estimation on limited hardware.

Abstract

We propose PuriLight, a lightweight and efficient framework for self-supervised monocular depth estimation, to address the dual challenges of computational efficiency and detail preservation. While recent advances in self-supervised depth estimation have reduced reliance on ground truth supervision, existing approaches remain constrained by either bulky architectures compromising practicality or lightweight models sacrificing structural precision. These dual limitations underscore the critical need to develop lightweight yet structurally precise architectures. Our framework addresses these limitations through a three-stage architecture incorporating three novel modules: the Shuffle-Dilation Convolution (SDC) module for local feature extraction, the Rotation-Adaptive Kernel Attention (RAKA) module for hierarchical feature enhancement, and the Deep Frequency Signal Purification (DFSP) module for global feature purification. Through effective collaboration, these modules enable PuriLight to achieve both lightweight and accurate feature extraction and processing. Extensive experiments demonstrate that PuriLight achieves state-of-the-art performance with minimal training parameters while maintaining exceptional computational efficiency. Codes will be available at https://github.com/ishrouder/PuriLight.

PuriLight: A Lightweight Shuffle and Purification Framework for Monocular Depth Estimation

TL;DR

PuriLight tackles the challenge of achieving high-quality monocular depth estimation on edge devices by designing a lightweight encoder that combines Shuffle-Dilation Convolution, Rotation-Adaptive Kernel Attention, and Deep Frequency Signal Purification to jointly capture local detail and global structure. It reduces global feature processing cost via a frequency-domain purification stage and employs a simple, efficient decoder with a self-supervised training regime using photometric reprojection and edge-aware smoothness losses. On KITTI Eigen split, PuriLight delivers state-of-the-art results among lightweight methods with only 2.7M parameters, and cross-dataset Make3D results suggest strong generalization, validating its practicality for real-world, resource-constrained deployments. The approach demonstrates significant efficiency gains without sacrificing depth accuracy, making it well-suited for edge devices and robotics applications; in particular, DFSP reduces global-processing complexity from to and from to while preserving essential structure. Overall, PuriLight offers a practical pathway to high-fidelity monocular depth estimation on limited hardware.

Abstract

We propose PuriLight, a lightweight and efficient framework for self-supervised monocular depth estimation, to address the dual challenges of computational efficiency and detail preservation. While recent advances in self-supervised depth estimation have reduced reliance on ground truth supervision, existing approaches remain constrained by either bulky architectures compromising practicality or lightweight models sacrificing structural precision. These dual limitations underscore the critical need to develop lightweight yet structurally precise architectures. Our framework addresses these limitations through a three-stage architecture incorporating three novel modules: the Shuffle-Dilation Convolution (SDC) module for local feature extraction, the Rotation-Adaptive Kernel Attention (RAKA) module for hierarchical feature enhancement, and the Deep Frequency Signal Purification (DFSP) module for global feature purification. Through effective collaboration, these modules enable PuriLight to achieve both lightweight and accurate feature extraction and processing. Extensive experiments demonstrate that PuriLight achieves state-of-the-art performance with minimal training parameters while maintaining exceptional computational efficiency. Codes will be available at https://github.com/ishrouder/PuriLight.
Paper Structure (16 sections, 17 equations, 6 figures, 4 tables)

This paper contains 16 sections, 17 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The proposed PuriLight delivers more refined depth details and higher accuracy with minimal parameters compared to other representative lightweight methods.
  • Figure 2: Overall Architecture of the PuriLight. The proposed architecture comprises three innovative modules: SDC, RAKA, and DFSP, with detailed implementations illustrated in Figure 3.
  • Figure 3: Structures of the proposed SDC, RAKA and DFSP module.
  • Figure 4: PoseNet is employed to estimate the poses between adjacent frames.
  • Figure 5: Qualitative results on KITTI. The proposed PuriLight demonstrates distinct advantages when compared with other representative methods. Monodepth2 godard2019digging and R-MSFM rmsfm have limited receptive fields, Lite-Mono litemono struggles with preserving fine details. In contrast, PuriLight can learn more intricate details and achieve more accurate depth maps.
  • ...and 1 more figures