Table of Contents
Fetching ...

Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges

Ukcheol Shin, Jinsun Park

TL;DR

This paper tackles the lack of large-scale, multi-spectral driving datasets for robust depth perception under adverse weather and lighting. It introduces the MS^2 dataset with synchronized stereo RGB, NIR, and thermal data, plus LiDAR and GNSS/IMU, delivering 162K data pairs and semi-dense ground-truth depth across diverse scenes. The authors perform comprehensive monocular and stereo depth benchmarks across modalities, revealing the notable robustness of thermal depth, especially in low-visibility conditions, and analyze domain shifts and fusion strategies. They also discuss key challenges and future directions, including foundation-model adaptation for non-conventional sensors and adaptive sensor fusion, with the dataset and code publicly available to spur further research.

Abstract

Achieving robust and accurate spatial perception under adverse weather and lighting conditions is crucial for the high-level autonomy of self-driving vehicles and robots. However, existing perception algorithms relying on the visible spectrum are highly affected by weather and lighting conditions. A long-wave infrared camera (i.e., thermal imaging camera) can be a potential solution to achieve high-level robustness. However, the absence of large-scale datasets and standardized benchmarks remains a significant bottleneck to progress in active research for robust visual perception from thermal images. To this end, this manuscript provides a large-scale Multi-Spectral Stereo (MS$^2$) dataset that consists of stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and GNSS/IMU information along with semi-dense depth ground truth. MS$^2$ dataset includes 162K synchronized multi-modal data pairs captured across diverse locations (e.g., urban city, residential area, campus, and high-way road) at different times (e.g., morning, daytime, and nighttime) and under various weather conditions (e.g., clear-sky, cloudy, and rainy). Secondly, we conduct a thorough evaluation of monocular and stereo depth estimation networks across RGB, NIR, and thermal modalities to establish standardized benchmark results on MS$^2$ depth test sets (e.g., day, night, and rainy). Lastly, we provide in-depth analyses and discuss the challenges revealed by the benchmark results, such as the performance variability for each modality under adverse conditions, domain shift between different sensor modalities, and potential research direction for thermal perception. Our dataset and source code are publicly available at https://sites.google.com/view/multi-spectral-stereo-dataset and https://github.com/UkcheolShin/SupDepth4Thermal.

Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges

TL;DR

This paper tackles the lack of large-scale, multi-spectral driving datasets for robust depth perception under adverse weather and lighting. It introduces the MS^2 dataset with synchronized stereo RGB, NIR, and thermal data, plus LiDAR and GNSS/IMU, delivering 162K data pairs and semi-dense ground-truth depth across diverse scenes. The authors perform comprehensive monocular and stereo depth benchmarks across modalities, revealing the notable robustness of thermal depth, especially in low-visibility conditions, and analyze domain shifts and fusion strategies. They also discuss key challenges and future directions, including foundation-model adaptation for non-conventional sensors and adaptive sensor fusion, with the dataset and code publicly available to spur further research.

Abstract

Achieving robust and accurate spatial perception under adverse weather and lighting conditions is crucial for the high-level autonomy of self-driving vehicles and robots. However, existing perception algorithms relying on the visible spectrum are highly affected by weather and lighting conditions. A long-wave infrared camera (i.e., thermal imaging camera) can be a potential solution to achieve high-level robustness. However, the absence of large-scale datasets and standardized benchmarks remains a significant bottleneck to progress in active research for robust visual perception from thermal images. To this end, this manuscript provides a large-scale Multi-Spectral Stereo (MS) dataset that consists of stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and GNSS/IMU information along with semi-dense depth ground truth. MS dataset includes 162K synchronized multi-modal data pairs captured across diverse locations (e.g., urban city, residential area, campus, and high-way road) at different times (e.g., morning, daytime, and nighttime) and under various weather conditions (e.g., clear-sky, cloudy, and rainy). Secondly, we conduct a thorough evaluation of monocular and stereo depth estimation networks across RGB, NIR, and thermal modalities to establish standardized benchmark results on MS depth test sets (e.g., day, night, and rainy). Lastly, we provide in-depth analyses and discuss the challenges revealed by the benchmark results, such as the performance variability for each modality under adverse conditions, domain shift between different sensor modalities, and potential research direction for thermal perception. Our dataset and source code are publicly available at https://sites.google.com/view/multi-spectral-stereo-dataset and https://github.com/UkcheolShin/SupDepth4Thermal.

Paper Structure

This paper contains 26 sections, 3 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Overview of Multi-Spectral Stereo (MS$^2$) dataset and depth maps from RGB, NIR, and thermal images in low-visibility conditions. MS$^2$ dataset provides multi-modal stereo data stream, including stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and GNSS/IMU information along with semi-dense depth ground truth, captured across diverse locations (e.g., urban city, residential area, campus, and high-way road) at different times (e.g., morning, daytime, and nighttime) and under various weather conditions (e.g., clear-sky, cloudy, and rainy). Furthermore, depth estimation results from thermal images show high-level reliability and robustness under low-light and rainy conditions.
  • Figure 2: Overview of vehicular data collection platform for MS$^2$ dataset. We designed a data collection platform consisting of RGB, NIR, thermal, and LiDAR stereo systems and a GPS/IMU module. Stereo RGB, NIR, and IMU modules are installed inside the vehicle to ensure reliable operation under adverse weather conditions. Stereo thermal cameras and LiDARs covered with water-proof housing are built on the vehicle's rooftop.
  • Figure 3: Data examples of Multi-Spectral Stereo (MS$^2$) outdoor driving dataset. The collected dataset provides about 162K synchronized data taken under locations of campus, city, residential area, road, and suburban with various time slots (morning, day, and night) and weather conditions (clear-sky, cloudy, and rainy)). For each block, three rows indicate RGB, NIR, and thermal images, respectively. According to the surrounding conditions, each spectrum sensor shows different aspects, advantages, and disadvantages induced by their sensor characteristics).
  • Figure 4: Pattern boards for multi-sensor calibration. We utilize three pattern boards (i.e., 6x6 AprilTag, 2x2 AprialTag, and copper-coated line boards) for multi-sensor calibrations.
  • Figure 5: Image pre-processing for MS$^2$ dataset. We rectified and cropped the original RGB, NIR, and thermal images with intrinsic and distortion parameters to make valid training data.
  • ...and 7 more figures