Cross-spectral Gated-RGB Stereo Depth Estimation
Samuel Brucker, Stefanie Walz, Mario Bijelic, Felix Heide
TL;DR
The paper tackles the challenge of achieving high-resolution, metric-depth maps at long range with cost-effective sensors, where gated depth methods previously lagged behind RGB imaging in resolution. It introduces a cross-spectral stereo network that fuses multi-view RCCB (high-resolution visible HDR) and gated NIR data, using a cross-spectral fusion module, PoseNet-based pose refinement, and attention-based feature fusion trained with self-supervised and LiDAR supervision. Key contributions include the cross-spectral depth estimation framework, a cross-modal stereo network, a training scheme that leverages both modalities, and a dataset with synchronized RCCB and gated views up to 220 m ground truth. The approach yields substantial improvements in MAE at long ranges and enables new applications such as long-distance small-object detection in autonomous driving, using economical CMOS sensors and active illumination.
Abstract
Gated cameras flood-illuminate a scene and capture the time-gated impulse response of a scene. By employing nanosecond-scale gates, existing sensors are capable of capturing mega-pixel gated images, delivering dense depth improving on today's LiDAR sensors in spatial resolution and depth precision. Although gated depth estimation methods deliver a million of depth estimates per frame, their resolution is still an order below existing RGB imaging methods. In this work, we combine high-resolution stereo HDR RCCB cameras with gated imaging, allowing us to exploit depth cues from active gating, multi-view RGB and multi-view NIR sensing -- multi-view and gated cues across the entire spectrum. The resulting capture system consists only of low-cost CMOS sensors and flood-illumination. We propose a novel stereo-depth estimation method that is capable of exploiting these multi-modal multi-view depth cues, including the active illumination that is measured by the RCCB camera when removing the IR-cut filter. The proposed method achieves accurate depth at long ranges, outperforming the next best existing method by 39% for ranges of 100 to 220m in MAE on accumulated LiDAR ground-truth. Our code, models and datasets are available at https://light.princeton.edu/gatedrccbstereo/ .
