Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions
Jiacong Xu, Mingqian Liao, K Ram Prabhakar, Vishal M. Patel
TL;DR
Low-light NeRF methods struggle with noise and ISP clipping, hindering geometry and texture recovery. The paper introduces Thermal-NeRF, which fuses thermal and short-exposure visible raw images with Retinex3D and Multi-Exposure to enable simultaneous visible and thermal view synthesis in dark scenes, and it presents MVTV, the first aligned multi-view thermal-visible dataset. It also demonstrates that these modalities benefit each other and can extend to explicit 3DGS-based, high-speed rendering, achieving improved detail preservation and noise smoothing. The work has practical implications for rescue, surveillance, and multimodal 3D reconstruction in challenging lighting conditions.
Abstract
Neural Radiance Fields (NeRF) accomplishes photo-realistic novel view synthesis by learning the implicit volumetric representation of a scene from multi-view images, which faithfully convey the colorimetric information. However, sensor noises will contaminate low-value pixel signals, and the lossy camera image signal processor will further remove near-zero intensities in extremely dark situations, deteriorating the synthesis performance. Existing approaches reconstruct low-light scenes from raw images but struggle to recover texture and boundary details in dark regions. Additionally, they are unsuitable for high-speed models relying on explicit representations. To address these issues, we present Thermal-NeRF, which takes thermal and visible raw images as inputs, considering the thermal camera is robust to the illumination variation and raw images preserve any possible clues in the dark, to accomplish visible and thermal view synthesis simultaneously. Also, the first multi-view thermal and visible dataset (MVTV) is established to support the research on multimodal NeRF. Thermal-NeRF achieves the best trade-off between detail preservation and noise smoothing and provides better synthesis performance than previous work. Finally, we demonstrate that both modalities are beneficial to each other in 3D reconstruction.
