Learning Depth from Past Selves: Self-Evolution Contrast for Robust Depth Estimation
Jing Cao, Kui Jiang, Shenyi Li, Xiaocheng Feng, Yong Huang
TL;DR
SEC-Depth addresses the fragility of self-supervised monocular depth estimation in adverse weather by introducing a latency-model-based self-evolution contrastive learning framework. It builds a dynamic queue of historical models to generate negative samples and couples a novel interval-based depth distribution constraint with a self-evolution loss $L_c$, leveraging $P_A$, $P_P$, and $P_N$ distributions and Jensen-Shannon divergence. The approach is plug-and-play, compatible with existing baselines like MonoViT and PlaneDepth, and delivers strong zero-shot generalization across WeatherKITTI, DrivingStereo, Cityscapes variants, and more, achieving notable improvements over both standard baselines and prior robust methods. Overall, SEC-Depth provides a practical, dataset-agnostic path to robust depth perception in autonomous systems without requiring architectural changes or annotated data.
Abstract
Self-supervised depth estimation has gained significant attention in autonomous driving and robotics. However, existing methods exhibit substantial performance degradation under adverse weather conditions such as rain and fog, where reduced visibility critically impairs depth prediction. To address this issue, we propose a novel self-evolution contrastive learning framework called SEC-Depth for self-supervised robust depth estimation tasks. Our approach leverages intermediate parameters generated during training to construct temporally evolving latency models. Using these, we design a self-evolution contrastive scheme to mitigate performance loss under challenging conditions. Concretely, we first design a dynamic update strategy of latency models for the depth estimation task to capture optimization states across training stages. To effectively leverage latency models, we introduce a self-evolution contrastive Loss (SECL) that treats outputs from historical latency models as negative samples. This mechanism adaptively adjusts learning objectives while implicitly sensing weather degradation severity, reducing the needs for manual intervention. Experiments show that our method integrates seamlessly into diverse baseline models and significantly enhances robustness in zero-shot evaluations.
