Table of Contents
Fetching ...

All-day Depth Completion via Thermal-LiDAR Fusion

Janghyun Kim, Minseong Kweon, Jinsun Park, Ukcheol Shin

TL;DR

This work addresses robust depth completion under adverse conditions by fusing thermal imagery with LiDAR. It introduces COPS, a framework that leverages a depth foundation model to provide dense pseudo-depth priors and a depth-aware contrastive learning objective that sharpens depth boundaries, coupled with stage-wise pseudo-supervision. Extensive benchmarks on MS^2 and ViViD demonstrate that thermal-based depth completion consistently outperforms RGB-based methods in low-light and rainy scenarios, with improvements observed across multiple baseline networks without adding inference cost. The results underscore the practical potential of thermal-LiDAR fusion for reliable perception in all-day outdoor and indoor environments, and the discussion outlines promising directions for adaptive supervision, confidence-aware fusion, and physics-informed modeling.

Abstract

Depth completion, which estimates dense depth from sparse LiDAR and RGB images, has demonstrated outstanding performance in well-lit conditions. However, due to the limitations of RGB sensors, existing methods often struggle to achieve reliable performance in harsh environments, such as heavy rain and low-light conditions. Furthermore, we observe that ground truth depth maps often suffer from large missing measurements in adverse weather conditions such as heavy rain, leading to insufficient supervision. In contrast, thermal cameras are known for providing clear and reliable visibility in such conditions, yet research on thermal-LiDAR depth completion remains underexplored. Moreover, the characteristics of thermal images, such as blurriness, low contrast, and noise, bring unclear depth boundary problems. To address these challenges, we first evaluate the feasibility and robustness of thermal-LiDAR depth completion across diverse lighting (eg., well-lit, low-light), weather (eg., clear-sky, rainy), and environment (eg., indoor, outdoor) conditions, by conducting extensive benchmarks on the MS$^2$ and ViViD datasets. In addition, we propose a framework that utilizes COntrastive learning and Pseudo-Supervision (COPS) to enhance depth boundary clarity and improve completion accuracy by leveraging a depth foundation model in two key ways. First, COPS enforces a depth-aware contrastive loss between different depth points by mining positive and negative samples using a monocular depth foundation model to sharpen depth boundaries. Second, it mitigates the issue of incomplete supervision from ground truth depth maps by leveraging foundation model predictions as dense depth priors. We also provide in-depth analyses of the key challenges in thermal-LiDAR depth completion to aid in understanding the task and encourage future research.

All-day Depth Completion via Thermal-LiDAR Fusion

TL;DR

This work addresses robust depth completion under adverse conditions by fusing thermal imagery with LiDAR. It introduces COPS, a framework that leverages a depth foundation model to provide dense pseudo-depth priors and a depth-aware contrastive learning objective that sharpens depth boundaries, coupled with stage-wise pseudo-supervision. Extensive benchmarks on MS^2 and ViViD demonstrate that thermal-based depth completion consistently outperforms RGB-based methods in low-light and rainy scenarios, with improvements observed across multiple baseline networks without adding inference cost. The results underscore the practical potential of thermal-LiDAR fusion for reliable perception in all-day outdoor and indoor environments, and the discussion outlines promising directions for adaptive supervision, confidence-aware fusion, and physics-informed modeling.

Abstract

Depth completion, which estimates dense depth from sparse LiDAR and RGB images, has demonstrated outstanding performance in well-lit conditions. However, due to the limitations of RGB sensors, existing methods often struggle to achieve reliable performance in harsh environments, such as heavy rain and low-light conditions. Furthermore, we observe that ground truth depth maps often suffer from large missing measurements in adverse weather conditions such as heavy rain, leading to insufficient supervision. In contrast, thermal cameras are known for providing clear and reliable visibility in such conditions, yet research on thermal-LiDAR depth completion remains underexplored. Moreover, the characteristics of thermal images, such as blurriness, low contrast, and noise, bring unclear depth boundary problems. To address these challenges, we first evaluate the feasibility and robustness of thermal-LiDAR depth completion across diverse lighting (eg., well-lit, low-light), weather (eg., clear-sky, rainy), and environment (eg., indoor, outdoor) conditions, by conducting extensive benchmarks on the MS and ViViD datasets. In addition, we propose a framework that utilizes COntrastive learning and Pseudo-Supervision (COPS) to enhance depth boundary clarity and improve completion accuracy by leveraging a depth foundation model in two key ways. First, COPS enforces a depth-aware contrastive loss between different depth points by mining positive and negative samples using a monocular depth foundation model to sharpen depth boundaries. Second, it mitigates the issue of incomplete supervision from ground truth depth maps by leveraging foundation model predictions as dense depth priors. We also provide in-depth analyses of the key challenges in thermal-LiDAR depth completion to aid in understanding the task and encourage future research.

Paper Structure

This paper contains 32 sections, 10 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Overview of the proposed method and depth completion result comparison between RGB and thermal modalities. The proposed contrastive learning method (a) aims to mitigate blurry depth boundaries and insufficient supervision issues caused by thermal images and adverse weather. The qualitative results (b) highlight the significant advantages of thermal-LiDAR fusion in low-light conditions.
  • Figure 2: Missing LiDAR measurements and blurry thermal image problems in the $\text{MS}^2$ dataset shin2023deep.
  • Figure 3: Overall framework of our depth completion. Our encoder-decoder network takes thermal image and LiDAR points as input, while pseudo-depth generation module only utilizes thermal image. The network is directly supervised using the pseudo-depth map and further incorporates it as a contrastive learning criterion through depth slicing.
  • Figure 4: Depth map comparisons between two modalities on NLSPN park2020non and GuideNet tang2020learning. The first and second rows present the results of nighttime scenarios, while the third and fourth rows correspond to rainy scenarios.
  • Figure 5: Depth map comparisons on the $\text{MS}^2$ dataset shin2023deep. Note that we dilated the sparse depth map.
  • ...and 2 more figures