Dusk Till Dawn: Self-supervised Nighttime Stereo Depth Estimation using Visual Foundation Models
Madhu Vankadari, Samuel Hodgson, Sangyun Shin, Kaichen Zhou Andrew Markham, Niki Trigoni
TL;DR
This work tackles nighttime self-supervised stereo depth estimation by leveraging visual foundation models to obtain robust, illumination-invariant features. A feature-level masking strategy and a distance regularizer improve depth accuracy in low-texture, poorly lit regions, while a cross-image transformer-based stereo matcher and RAFT-style upsampling deliver refined disparities. The authors introduce a comprehensive set of new evaluation metrics based on depth bins to better reflect nonuniform ground-truth depth, and demonstrate strong generalization on Oxford RobotCar and the MS2 nighttime sequences. Overall, the method achieves competitive performance against supervised baselines and shows robust depth estimation in challenging night scenes with minimal ground-truth supervision.
Abstract
Self-supervised depth estimation algorithms rely heavily on frame-warping relationships, exhibiting substantial performance degradation when applied in challenging circumstances, such as low-visibility and nighttime scenarios with varying illumination conditions. Addressing this challenge, we introduce an algorithm designed to achieve accurate self-supervised stereo depth estimation focusing on nighttime conditions. Specifically, we use pretrained visual foundation models to extract generalised features across challenging scenes and present an efficient method for matching and integrating these features from stereo frames. Moreover, to prevent pixels violating photometric consistency assumption from negatively affecting the depth predictions, we propose a novel masking approach designed to filter out such pixels. Lastly, addressing weaknesses in the evaluation of current depth estimation algorithms, we present novel evaluation metrics. Our experiments, conducted on challenging datasets including Oxford RobotCar and Multi-Spectral Stereo, demonstrate the robust improvements realized by our approach. Code is available at: https://github.com/madhubabuv/dtd
