Table of Contents
Fetching ...

ThermalGaussian: Thermal 3D Gaussian Splatting

Rongfeng Lu, Hangyu Chen, Zunjie Zhu, Yuhang Qin, Ming Lu, Le Zhang, Chenggang Yan, Anke Xue

TL;DR

ThermalGaussian extends the fast, explicit 3D Gaussian Splatting representation to jointly model RGB and thermal modalities. It introduces multimodal initialization, three thermal Gaussian designs (MFTG, MSMG, OMMG), and a dynamic multimodal regularization scheme that balances learning across modalities, plus a thermally-aware loss with a smoothing term. The authors release RGBT-Scenes, a real-world RGB–thermal dataset, and demonstrate that ThermalGaussian improves both thermal and RGB rendering quality while reducing model storage by about 90% relative to single-modality baselines, enabling faster, multi-view thermal reconstruction. This work advances practical multi-modal 3D reconstruction for surveillance and related applications by combining explicit 3D Gaussians with cross-modal optimization and calibration strategies.

Abstract

Thermography is especially valuable for the military and other users of surveillance cameras. Some recent methods based on Neural Radiance Fields (NeRF) are proposed to reconstruct the thermal scenes in 3D from a set of thermal and RGB images. However, unlike NeRF, 3D Gaussian splatting (3DGS) prevails due to its rapid training and real-time rendering. In this work, we propose ThermalGaussian, the first thermal 3DGS approach capable of rendering high-quality images in RGB and thermal modalities. We first calibrate the RGB camera and the thermal camera to ensure that both modalities are accurately aligned. Subsequently, we use the registered images to learn the multimodal 3D Gaussians. To prevent the overfitting of any single modality, we introduce several multimodal regularization constraints. We also develop smoothing constraints tailored to the physical characteristics of the thermal modality. Besides, we contribute a real-world dataset named RGBT-Scenes, captured by a hand-hold thermal-infrared camera, facilitating future research on thermal scene reconstruction. We conduct comprehensive experiments to show that ThermalGaussian achieves photorealistic rendering of thermal images and improves the rendering quality of RGB images. With the proposed multimodal regularization constraints, we also reduced the model's storage cost by 90%. Our project page is at https://thermalgaussian.github.io/.

ThermalGaussian: Thermal 3D Gaussian Splatting

TL;DR

ThermalGaussian extends the fast, explicit 3D Gaussian Splatting representation to jointly model RGB and thermal modalities. It introduces multimodal initialization, three thermal Gaussian designs (MFTG, MSMG, OMMG), and a dynamic multimodal regularization scheme that balances learning across modalities, plus a thermally-aware loss with a smoothing term. The authors release RGBT-Scenes, a real-world RGB–thermal dataset, and demonstrate that ThermalGaussian improves both thermal and RGB rendering quality while reducing model storage by about 90% relative to single-modality baselines, enabling faster, multi-view thermal reconstruction. This work advances practical multi-modal 3D reconstruction for surveillance and related applications by combining explicit 3D Gaussians with cross-modal optimization and calibration strategies.

Abstract

Thermography is especially valuable for the military and other users of surveillance cameras. Some recent methods based on Neural Radiance Fields (NeRF) are proposed to reconstruct the thermal scenes in 3D from a set of thermal and RGB images. However, unlike NeRF, 3D Gaussian splatting (3DGS) prevails due to its rapid training and real-time rendering. In this work, we propose ThermalGaussian, the first thermal 3DGS approach capable of rendering high-quality images in RGB and thermal modalities. We first calibrate the RGB camera and the thermal camera to ensure that both modalities are accurately aligned. Subsequently, we use the registered images to learn the multimodal 3D Gaussians. To prevent the overfitting of any single modality, we introduce several multimodal regularization constraints. We also develop smoothing constraints tailored to the physical characteristics of the thermal modality. Besides, we contribute a real-world dataset named RGBT-Scenes, captured by a hand-hold thermal-infrared camera, facilitating future research on thermal scene reconstruction. We conduct comprehensive experiments to show that ThermalGaussian achieves photorealistic rendering of thermal images and improves the rendering quality of RGB images. With the proposed multimodal regularization constraints, we also reduced the model's storage cost by 90%. Our project page is at https://thermalgaussian.github.io/.
Paper Structure (16 sections, 12 equations, 7 figures, 4 tables)

This paper contains 16 sections, 12 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Compared to NeRF-based methods hassan2024thermonerf and methods that directly use thermal images for training 3DGS, our approach not only improves the thermal image rendering quality but also significantly reduces the model's storage through multimodal regularization (MR).
  • Figure 2: Top: camera poses and point cloud generated by SfM. Bottom: input images for SfM.
  • Figure 3: ThermalGaussian Overview. We simultaneously construct Gaussians for RGB and thermal modalities using the point cloud obtained from multimodal initialization. Each modality's Gaussians are used to render images in their respective modality. However, the losses from different modalities are combined to jointly constrain the optimization of both sets of Gaussians. Additionally, we establish a multimodal regularization based on the number of Gaussians in each modality, which dynamically adjusts the training coefficients for both modalities.
  • Figure 4: Different calibration boards for thermal Cameras.
  • Figure 5: We present qualitative thermal image comparisons between our method, previous approaches hassan2024thermonerfkerbl20233d, and the corresponding ground truth images from test views. We also show the training results of the MSX images, which are easier to apply.
  • ...and 2 more figures