Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
JunDa Cheng, Wei Yin, Kaixuan Wang, Xiaozhi Chen, Shijie Wang, Xin Yang
TL;DR
This work tackles depth estimation for autonomous driving under noisy camera poses by introducing AFNet, a two-branch network that fuses single-view and multi-view depth predictions with an adaptive fusion module. A warping-based confidence M_w, alongside branch confidences M_s and M_m, enables robust, pixel-wise fusion that gracefully handles textureless regions, dynamic objects, and pose errors. Empirical results on KITTI and DDAD show state-of-the-art accuracy and, crucially, superior robustness under pose perturbations, with a pose-correction variant further boosting performance under challenging noise. The approach advances practical depth perception for autonomous systems by balancing accuracy and resilience in real-world conditions.
Abstract
Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings. To address this challenge, we propose a single-view and multi-view fused depth estimation system, which adaptively integrates high-confident multi-view and single-view results for both robust and accurate depth estimations. The adaptive fusion module performs fusion by dynamically selecting high-confidence regions between two branches based on a wrapping confidence map. Thus, the system tends to choose the more reliable branch when facing textureless scenes, inaccurate calibration, dynamic objects, and other degradation or challenging conditions. Our method outperforms state-of-the-art multi-view and fusion methods under robustness testing. Furthermore, we achieve state-of-the-art performance on challenging benchmarks (KITTI and DDAD) when given accurate pose estimations. Project website: https://github.com/Junda24/AFNet/.
