Table of Contents
Fetching ...

4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting

Wanlin Liang, Hongbin Xu, Weitao Chen, Feng Xiao, Wenxiong Kang

TL;DR

4DStyleGaussian is introduced, a novel 4D style transfer framework designed to achieve real-time stylization of arbitrary style references while maintaining reasonable content affinity, multi-view consistency, and temporal coherence.

Abstract

3D neural style transfer has gained significant attention for its potential to provide user-friendly stylization with spatial consistency. However, existing 3D style transfer methods often fall short in terms of inference efficiency, generalization ability, and struggle to handle dynamic scenes with temporal consistency. In this paper, we introduce 4DStyleGaussian, a novel 4D style transfer framework designed to achieve real-time stylization of arbitrary style references while maintaining reasonable content affinity, multi-view consistency, and temporal coherence. Our approach leverages an embedded 4D Gaussian Splatting technique, which is trained using a reversible neural network for reducing content loss in the feature distillation process. Utilizing the 4D embedded Gaussians, we predict a 4D style transformation matrix that facilitates spatially and temporally consistent style transfer with Gaussian Splatting. Experiments demonstrate that our method can achieve high-quality and zero-shot stylization for 4D scenarios with enhanced efficiency and spatial-temporal consistency.

4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting

TL;DR

4DStyleGaussian is introduced, a novel 4D style transfer framework designed to achieve real-time stylization of arbitrary style references while maintaining reasonable content affinity, multi-view consistency, and temporal coherence.

Abstract

3D neural style transfer has gained significant attention for its potential to provide user-friendly stylization with spatial consistency. However, existing 3D style transfer methods often fall short in terms of inference efficiency, generalization ability, and struggle to handle dynamic scenes with temporal consistency. In this paper, we introduce 4DStyleGaussian, a novel 4D style transfer framework designed to achieve real-time stylization of arbitrary style references while maintaining reasonable content affinity, multi-view consistency, and temporal coherence. Our approach leverages an embedded 4D Gaussian Splatting technique, which is trained using a reversible neural network for reducing content loss in the feature distillation process. Utilizing the 4D embedded Gaussians, we predict a 4D style transformation matrix that facilitates spatially and temporally consistent style transfer with Gaussian Splatting. Experiments demonstrate that our method can achieve high-quality and zero-shot stylization for 4D scenarios with enhanced efficiency and spatial-temporal consistency.

Paper Structure

This paper contains 27 sections, 20 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Zero-shot 4D Style Gaussian. 4DStyleGaussian can transfer the reference style to the 4D scene in a zero-shot manner and maintain multi-view and cross-time consistency.
  • Figure 2: Overview of our method. We propose 4DStyleGaussian, a zero-shot style transfer method with 4D Gaussian splatting. Our method pipeline comprises two training stages. Firstly, we train embedded Gaussians with a learnable reversible neural network that ensures preserving the content affinity and clear details. Secondly, we train a linear 4D style transformation matrix with the embedded Gaussians optimized in the first training stage to conduct spatially and temporally consistent style transfer on 4D dynamic scenes.
  • Figure 3: Qualitative results of stylized novel views at different times and styles. Our method can generate high-quality stylized synthesis and have good generalization on various style images while maintaining spatial-temporal consistency.
  • Figure 4: Comparsions with baselines of stylized novel views at different times and styles. Our method has better performance than the baselines, achieving better visual effects, and is consistent with the style input with spatial-temporal consistency.
  • Figure 5: Comparsions of stylized novel views at different times with video transfer methods. CAP-VSTNet and ReReVST show inconsistency between different views of different video frames(As shown in the blue box in the figures) while our method illustrates better consistency between multiple views and across time.
  • ...and 4 more figures