Table of Contents
Fetching ...

ThermoNeRF: Joint RGB and Thermal Novel View Synthesis for Building Facades using Multimodal Neural Radiance Fields

Mariam Hassan, Florent Forest, Olga Fink, Malcolm Mielle

TL;DR

ThermoNeRF introduces a multimodal Neural Radiance Field framework that jointly renders RGB and thermal views while preserving temperature fidelity. By sharing a density MLP for geometry and using separate heads for RGB and temperature, it prevents cross-modal leakage and achieves superior temperature reconstruction against baselines using concatenated inputs. The authors also present ThermoScenes, a paired RGB+thermal dataset for 16 scenes, enabling robust evaluation of temperature accuracy and novel-view synthesis. Empirical results show average temperature MAEs of $1.13^\ ext{\circ}C$ (buildings) and $0.41^\ ext{\circ}C$ (other scenes), representing substantial improvements over prior baselines and demonstrating practical potential for building retrofit, energy analysis, and infrastructure inspection. The work advances thermal scene understanding by unifying geometry and temperature under a coherent NeRF framework and provides a public dataset and code to foster further research.

Abstract

Thermal scene reconstruction holds great potential for various applications, such as analyzing building energy consumption and performing non-destructive infrastructure testing. However, existing methods typically require dense scene measurements and often rely on RGB images for 3D geometry reconstruction, projecting thermal information post-reconstruction. This can lead to inconsistencies between the reconstructed geometry and temperature data and their actual values. To address this challenge, we propose ThermoNeRF, a novel multimodal approach based on Neural Radiance Fields that jointly renders new RGB and thermal views of a scene, and ThermoScenes, a dataset of paired RGB+thermal images comprising 8 scenes of building facades and 8 scenes of everyday objects. To address the lack of texture in thermal images, ThermoNeRF uses paired RGB and thermal images to learn scene density, while separate networks estimate color and temperature data. Unlike comparable studies, our focus is on temperature reconstruction and experimental results demonstrate that ThermoNeRF achieves an average mean absolute error of 1.13C and 0.41C for temperature estimation in buildings and other scenes, respectively, representing an improvement of over 50% compared to using concatenated RGB+thermal data as input to a standard NeRF. Code and dataset are available online.

ThermoNeRF: Joint RGB and Thermal Novel View Synthesis for Building Facades using Multimodal Neural Radiance Fields

TL;DR

ThermoNeRF introduces a multimodal Neural Radiance Field framework that jointly renders RGB and thermal views while preserving temperature fidelity. By sharing a density MLP for geometry and using separate heads for RGB and temperature, it prevents cross-modal leakage and achieves superior temperature reconstruction against baselines using concatenated inputs. The authors also present ThermoScenes, a paired RGB+thermal dataset for 16 scenes, enabling robust evaluation of temperature accuracy and novel-view synthesis. Empirical results show average temperature MAEs of (buildings) and (other scenes), representing substantial improvements over prior baselines and demonstrating practical potential for building retrofit, energy analysis, and infrastructure inspection. The work advances thermal scene understanding by unifying geometry and temperature under a coherent NeRF framework and provides a public dataset and code to foster further research.

Abstract

Thermal scene reconstruction holds great potential for various applications, such as analyzing building energy consumption and performing non-destructive infrastructure testing. However, existing methods typically require dense scene measurements and often rely on RGB images for 3D geometry reconstruction, projecting thermal information post-reconstruction. This can lead to inconsistencies between the reconstructed geometry and temperature data and their actual values. To address this challenge, we propose ThermoNeRF, a novel multimodal approach based on Neural Radiance Fields that jointly renders new RGB and thermal views of a scene, and ThermoScenes, a dataset of paired RGB+thermal images comprising 8 scenes of building facades and 8 scenes of everyday objects. To address the lack of texture in thermal images, ThermoNeRF uses paired RGB and thermal images to learn scene density, while separate networks estimate color and temperature data. Unlike comparable studies, our focus is on temperature reconstruction and experimental results demonstrate that ThermoNeRF achieves an average mean absolute error of 1.13C and 0.41C for temperature estimation in buildings and other scenes, respectively, representing an improvement of over 50% compared to using concatenated RGB+thermal data as input to a standard NeRF. Code and dataset are available online.
Paper Structure (17 sections, 4 equations, 6 figures, 5 tables)

This paper contains 17 sections, 4 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview of the capabilities of the proposed ThermoNeRF. It is a multimodal NeRF-based approach using paired thermal and RGB images. ThermoNeRF demonstrates enhanced geometry and thermal information estimation compared to thermal only models which cannot recover the geometry, or RGB+Thermal NeRF models for which information from the RGB image leaks into the temperature of the final rendered view.
  • Figure 2: ThermoNerf architecture; in red are parts of the network related to the generation of thermal images, while blue represents the parts related to the generation of RGB images. The $\text{MLP}_{\text{th}}$ is only dependent on the intermediate features $f$ as input. A is appearance embeddings that accounts for difference in exposure for RGB images.
  • Figure 3: Non-Lambertian effects---i.e. light reflections---present in the RGB images (left) depend on the angle of view and are not present in thermal images (right). Furthermore, textures and edge features in the thermal images are soft due to the ghosting effect, as opposed to the sharpness of the RGB image and its background.
  • Figure 4: Comparison of examples of thermal and RGB renderings of unseen poses for the four outdoor scenes (top to bottom rows): Building (Spring), Building (Winter), Exhibition Building, Trees, and Raspberry pi. ThermoNeRF is closest to the ground-truth thermal image, while preserving RGB quality.
  • Figure 5: Per-pixel absolute errors in temperature estimation for renderings of unseen poses for three outdoor scenes: (left top to bottom) Building Spring, Exhibition Building and Trees, and three indoor scenes: (right top to bottom) Double Robot, RaspberryPi, and Heated Water Kettle. We observe fewer errors on ThermoNeRF than the baselines. Note that N$_\text{rgb+th}$ stands for Nerfacto$_\text{rgb+th}$.
  • ...and 1 more figures