Table of Contents
Fetching ...

SpectralSplat: Appearance-Disentangled Feed-Forward Gaussian Splatting for Driving Scenes

Quentin Herau, Tianshuo Xu, Depu Meng, Jiezhi Yang, Chensheng Peng, Spencer Sherk, Yihan Hu, Wei Zhan

Abstract

Feed-forward 3D Gaussian Splatting methods have achieved impressive reconstruction quality for autonomous driving scenes, yet they entangle scene geometry with transient appearance properties such as lighting, weather, and time of day. This coupling prevents relighting, appearance transfer, and consistent rendering across multi-traversal data captured under varying environmental conditions. We present SpectralSplat, a method that disentangles appearance from geometry within a feed-forward Gaussian Splatting framework. Our key insight is to factor color prediction into an appearance-agnostic base stream and and appearance-conditioned adapted stream, both produced by a shared MLP conditioned on a global appearance embedding derived from DINOv2 features. To enforce disentanglement, we train with paired observations generated by a hybrid relighting pipeline that combines physics-based intrinsic decomposition with diffusion based generative refinement, and supervise with complementary consistency, reconstruction, cross-appearance, and base color losses. We further introduce an appearance-adaptable temporal history that stores appearance-agnostic features, enabling accumulated Gaussians to be re-rendered under arbitrary target appearances. Experiments demonstrate that SpectralSplat preserves the reconstruction quality of the underlying backbone while enabling controllable appearance transfer and temporally consistent relighting across driving sequences.

SpectralSplat: Appearance-Disentangled Feed-Forward Gaussian Splatting for Driving Scenes

Abstract

Feed-forward 3D Gaussian Splatting methods have achieved impressive reconstruction quality for autonomous driving scenes, yet they entangle scene geometry with transient appearance properties such as lighting, weather, and time of day. This coupling prevents relighting, appearance transfer, and consistent rendering across multi-traversal data captured under varying environmental conditions. We present SpectralSplat, a method that disentangles appearance from geometry within a feed-forward Gaussian Splatting framework. Our key insight is to factor color prediction into an appearance-agnostic base stream and and appearance-conditioned adapted stream, both produced by a shared MLP conditioned on a global appearance embedding derived from DINOv2 features. To enforce disentanglement, we train with paired observations generated by a hybrid relighting pipeline that combines physics-based intrinsic decomposition with diffusion based generative refinement, and supervise with complementary consistency, reconstruction, cross-appearance, and base color losses. We further introduce an appearance-adaptable temporal history that stores appearance-agnostic features, enabling accumulated Gaussians to be re-rendered under arbitrary target appearances. Experiments demonstrate that SpectralSplat preserves the reconstruction quality of the underlying backbone while enabling controllable appearance transfer and temporally consistent relighting across driving sequences.

Paper Structure

This paper contains 18 sections, 6 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Appearance-disentangled Gaussian reconstruction. Given input frames (row 1) relighted under progressively varying conditions (row 2), SpectralSplat produces Gaussians whose base colors remain consistent regardless of the input appearance (row 3), while the adapted colors faithfully capture each target lighting condition (row 4). The rendered outputs (row 5) rendered from our predicted Gaussians with swapped appearance embeddings demonstrate the appearance transfer capability.
  • Figure 2: Training pipeline. Original and augmented images share geometry but produce separate features and appearance embeddings. Four losses enforce disentanglement: $\mathcal{L}_{\mathrm{inv}}$ (base invariance), $\mathcal{L}_{\mathrm{aug}}$ (augmented reconstruction), $\mathcal{L}_{\mathrm{swap}}$ (cross-appearance), and $\mathcal{L}_{\mathrm{base}}$ (base color alignment).
  • Figure 3: Relighting pipeline. MVInverse + physics rendering is 3D-consistent but flat; IC-Light alone is photorealistic but inconsistent; our hybrid pipeline achieves both.
  • Figure 4: Cross-appearance results on Waymo.Rows 1--3: source ground truth, adapted render, and base color. Rows 4--6: same for the augmented condition. Base colors from both are nearly identical, confirming appearance-invariance. Rows 7--8: swapping the appearance embedding transfers appearance while preserving geometry.
  • Figure 5: t-SNE of appearance embeddings. Embeddings cluster by illumination type, confirming the encoder captures meaningful appearance information.
  • ...and 4 more figures