PuriLight: A Lightweight Shuffle and Purification Framework for Monocular Depth Estimation
Yujie Chen, Li Zhang, Xiaomeng Chu, Tian Zhang
TL;DR
PuriLight tackles the challenge of achieving high-quality monocular depth estimation on edge devices by designing a lightweight encoder that combines Shuffle-Dilation Convolution, Rotation-Adaptive Kernel Attention, and Deep Frequency Signal Purification to jointly capture local detail and global structure. It reduces global feature processing cost via a frequency-domain purification stage and employs a simple, efficient decoder with a self-supervised training regime using photometric reprojection and edge-aware smoothness losses. On KITTI Eigen split, PuriLight delivers state-of-the-art results among lightweight methods with only 2.7M parameters, and cross-dataset Make3D results suggest strong generalization, validating its practicality for real-world, resource-constrained deployments. The approach demonstrates significant efficiency gains without sacrificing depth accuracy, making it well-suited for edge devices and robotics applications; in particular, DFSP reduces global-processing complexity from $O(hN^2+Nd)$ to $O(NC)$ and from $O(N^2d)$ to $O(N(C+\log N))$ while preserving essential structure. Overall, PuriLight offers a practical pathway to high-fidelity monocular depth estimation on limited hardware.
Abstract
We propose PuriLight, a lightweight and efficient framework for self-supervised monocular depth estimation, to address the dual challenges of computational efficiency and detail preservation. While recent advances in self-supervised depth estimation have reduced reliance on ground truth supervision, existing approaches remain constrained by either bulky architectures compromising practicality or lightweight models sacrificing structural precision. These dual limitations underscore the critical need to develop lightweight yet structurally precise architectures. Our framework addresses these limitations through a three-stage architecture incorporating three novel modules: the Shuffle-Dilation Convolution (SDC) module for local feature extraction, the Rotation-Adaptive Kernel Attention (RAKA) module for hierarchical feature enhancement, and the Deep Frequency Signal Purification (DFSP) module for global feature purification. Through effective collaboration, these modules enable PuriLight to achieve both lightweight and accurate feature extraction and processing. Extensive experiments demonstrate that PuriLight achieves state-of-the-art performance with minimal training parameters while maintaining exceptional computational efficiency. Codes will be available at https://github.com/ishrouder/PuriLight.
