Table of Contents
Fetching ...

Adaptive Stereo Depth Estimation with Multi-Spectral Images Across All Lighting Conditions

Zihan Qin, Jialei Xu, Wenbo Zhao, Junjun Jiang, Xianming Liu

TL;DR

This work proposes a novel framework incorporating stereo depth estimation to enforce accurate geometric constraints across different spectra, and introduces Degradation Masking, which leverages robust monocular thermal depth estimation in degraded regions.

Abstract

Depth estimation under adverse conditions remains a significant challenge. Recently, multi-spectral depth estimation, which integrates both visible light and thermal images, has shown promise in addressing this issue. However, existing algorithms struggle with precise pixel-level feature matching, limiting their ability to fully exploit geometric constraints across different spectra. To address this, we propose a novel framework incorporating stereo depth estimation to enforce accurate geometric constraints. In particular, we treat the visible light and thermal images as a stereo pair and utilize a Cross-modal Feature Matching (CFM) Module to construct a cost volume for pixel-level matching. To mitigate the effects of poor lighting on stereo matching, we introduce Degradation Masking, which leverages robust monocular thermal depth estimation in degraded regions. Our method achieves state-of-the-art (SOTA) performance on the Multi-Spectral Stereo (MS2) dataset, with qualitative evaluations demonstrating high-quality depth maps under varying lighting conditions.

Adaptive Stereo Depth Estimation with Multi-Spectral Images Across All Lighting Conditions

TL;DR

This work proposes a novel framework incorporating stereo depth estimation to enforce accurate geometric constraints across different spectra, and introduces Degradation Masking, which leverages robust monocular thermal depth estimation in degraded regions.

Abstract

Depth estimation under adverse conditions remains a significant challenge. Recently, multi-spectral depth estimation, which integrates both visible light and thermal images, has shown promise in addressing this issue. However, existing algorithms struggle with precise pixel-level feature matching, limiting their ability to fully exploit geometric constraints across different spectra. To address this, we propose a novel framework incorporating stereo depth estimation to enforce accurate geometric constraints. In particular, we treat the visible light and thermal images as a stereo pair and utilize a Cross-modal Feature Matching (CFM) Module to construct a cost volume for pixel-level matching. To mitigate the effects of poor lighting on stereo matching, we introduce Degradation Masking, which leverages robust monocular thermal depth estimation in degraded regions. Our method achieves state-of-the-art (SOTA) performance on the Multi-Spectral Stereo (MS2) dataset, with qualitative evaluations demonstrating high-quality depth maps under varying lighting conditions.

Paper Structure

This paper contains 16 sections, 10 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Depth from images of different modalities. (a) and (b) show the visible light and thermal images, respectively; (c) is the LiDAR ground truth corresponding to the thermal image; (d) and (e) present the depth maps obtained from monocular methods using the visible light and thermal modalities, respectively; (f) illustrates the depth map estimated by our multi-spectral method.
  • Figure 2: The architecture of our proposed network. The method involves the following steps: First, based on the input visible light images $I_{vis}$ and thermal images $I_{thr}$, we utilize a cross-attention-based feature extractor to generate aligned feature vectors for each pixel by projecting between the two modalities and obtaining the cost volume for pixel-level matching. Second, we perform monocular depth estimation independently for each modality, producing pixel-wise depth probability distributions. Third, we apply Degradation Masking, derived from the visible image's depth probability distribution, to the cost volume to remove inaccurate matches. Finally, we utilize the final layer of features from the thermal MDP Module to degrade the masked cost volume into monocular thermal depth estimation, producing the final depth map through the Depth Module.
  • Figure 3: Quantitative depth comparison on the MS2 dataset. From left to right: visible images, thermal images, depth maps generated by Adabins bhat2021adabins using either visible or thermal images, and depth and variance maps produced by our approach.The first two rows show results from the day test set, the middle two from the night test set, and the last two from the rainy test set. The results demonstrate that our method effectively leverages information from different modalities, producing robust and stable results under varying lighting conditions.