Table of Contents
Fetching ...

Veta-GS: View-dependent deformable 3D Gaussian Splatting for thermal infrared Novel-view Synthesis

Myeongseok Nam, Wongi Park, Minsol Kim, Hyejin Hur, Soomok Lee

TL;DR

Veta-GS addresses the challenge of robust thermal infrared novel-view synthesis by introducing a view-dependent deformation field that uses camera pose $x$ and view direction $v$ to modulate 3D Gaussian primitives, along with a Thermal Feature Extractor (TFE) and a MonoSSIM loss to capture appearance, edge, and frequency information. A frustum-based masking strategy confines deformation to Gaussians inside the camera frustum, accelerating training. Evaluated on TI-NSD, Veta-GS outperforms state-of-the-art methods across indoor, outdoor, and UAV scenes, demonstrating improved PSNR, SSIM, and LPIPS while reducing artifacts such as floaters and blur. The approach combines explicit 3D Gaussian splatting with view-conditioned deformation and multi-branch perceptual losses, enabling robust, real-time-friendly thermal NVIS. Future work suggests extending to dynamic TIR scenes and further optimizing the computational cost of the Thermal Feature Extractor.

Abstract

Recently, 3D Gaussian Splatting (3D-GS) based on Thermal Infrared (TIR) imaging has gained attention in novel-view synthesis, showing real-time rendering. However, novel-view synthesis with thermal infrared images suffers from transmission effects, emissivity, and low resolution, leading to floaters and blur effects in rendered images. To address these problems, we introduce Veta-GS, which leverages a view-dependent deformation field and a Thermal Feature Extractor (TFE) to precisely capture subtle thermal variations and maintain robustness. Specifically, we design view-dependent deformation field that leverages camera position and viewing direction, which capture thermal variations. Furthermore, we introduce the Thermal Feature Extractor (TFE) and MonoSSIM loss, which consider appearance, edge, and frequency to maintain robustness. Extensive experiments on the TI-NSD benchmark show that our method achieves better performance over existing methods.

Veta-GS: View-dependent deformable 3D Gaussian Splatting for thermal infrared Novel-view Synthesis

TL;DR

Veta-GS addresses the challenge of robust thermal infrared novel-view synthesis by introducing a view-dependent deformation field that uses camera pose and view direction to modulate 3D Gaussian primitives, along with a Thermal Feature Extractor (TFE) and a MonoSSIM loss to capture appearance, edge, and frequency information. A frustum-based masking strategy confines deformation to Gaussians inside the camera frustum, accelerating training. Evaluated on TI-NSD, Veta-GS outperforms state-of-the-art methods across indoor, outdoor, and UAV scenes, demonstrating improved PSNR, SSIM, and LPIPS while reducing artifacts such as floaters and blur. The approach combines explicit 3D Gaussian splatting with view-conditioned deformation and multi-branch perceptual losses, enabling robust, real-time-friendly thermal NVIS. Future work suggests extending to dynamic TIR scenes and further optimizing the computational cost of the Thermal Feature Extractor.

Abstract

Recently, 3D Gaussian Splatting (3D-GS) based on Thermal Infrared (TIR) imaging has gained attention in novel-view synthesis, showing real-time rendering. However, novel-view synthesis with thermal infrared images suffers from transmission effects, emissivity, and low resolution, leading to floaters and blur effects in rendered images. To address these problems, we introduce Veta-GS, which leverages a view-dependent deformation field and a Thermal Feature Extractor (TFE) to precisely capture subtle thermal variations and maintain robustness. Specifically, we design view-dependent deformation field that leverages camera position and viewing direction, which capture thermal variations. Furthermore, we introduce the Thermal Feature Extractor (TFE) and MonoSSIM loss, which consider appearance, edge, and frequency to maintain robustness. Extensive experiments on the TI-NSD benchmark show that our method achieves better performance over existing methods.

Paper Structure

This paper contains 8 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of Veta-GS pipeline. We utilize camera position $x$, viewing direction $v$, time $t$, and 3D Gaussian’s position $\mu$ with positional encoding as input to deform 3D gaussians, by obtaining the offset $(\delta \mu,\;\delta r,\;\delta s)$. Further we introduce Thermal Feature Extractor (TFE) and MonoSSIM loss which focus on appearance, edge, and frequency to show robustness rendering.
  • Figure 2: Frustum-based masking. (a) Existing method deforms all 3D gaussians. On the other hand, (b) we identify the 3D Gaussians bounded by the frustum, then selectively apply deformation only to those 3D Gaussians, thereby accelerate training speed.
  • Figure 3: Visualization of experiments on TI-NSD dataset including indoor scenes and outdoor scenes. Experiment results demonstrate that Veta-GS shows high-quality 3D rendering across small-scale scenes to large-scale scenes, while 3D-GSkerbl20233d and Thermal3D-GSchen2024thermal3d struggle with floater artifacts and thermal variations.
  • Figure 4: (a) Comparison of Mutual Information (MI) between view-dependent and time embedding on several scenes. (b) Comparison of training convergence curve between the modified and original formulation of $\mathcal{L}_{\text{Mono}}$.
  • Figure 5: Failure cases on textures with significant variations.