Table of Contents
Fetching ...

StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields

Hongbin Xu, Weitao Chen, Feng Xiao, Baigui Sun, Wenxiong Kang

TL;DR

StyleDyRF is introduced, a method that represents the 4D feature space by deforming a canonical feature volume and learns a linear style transformation matrix on the feature volume in a data-driven fashion and outperforms existing methods in terms of visual quality and consistency.

Abstract

4D style transfer aims at transferring arbitrary visual style to the synthesized novel views of a dynamic 4D scene with varying viewpoints and times. Existing efforts on 3D style transfer can effectively combine the visual features of style images and neural radiance fields (NeRF) but fail to handle the 4D dynamic scenes limited by the static scene assumption. Consequently, we aim to handle the novel challenging problem of 4D style transfer for the first time, which further requires the consistency of stylized results on dynamic objects. In this paper, we introduce StyleDyRF, a method that represents the 4D feature space by deforming a canonical feature volume and learns a linear style transformation matrix on the feature volume in a data-driven fashion. To obtain the canonical feature volume, the rays at each time step are deformed with the geometric prior of a pre-trained dynamic NeRF to render the feature map under the supervision of pre-trained visual encoders. With the content and style cues in the canonical feature volume and the style image, we can learn the style transformation matrix from their covariance matrices with lightweight neural networks. The learned style transformation matrix can reflect a direct matching of feature covariance from the content volume to the given style pattern, in analogy with the optimization of the Gram matrix in traditional 2D neural style transfer. The experimental results show that our method not only renders 4D photorealistic style transfer results in a zero-shot manner but also outperforms existing methods in terms of visual quality and consistency.

StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields

TL;DR

StyleDyRF is introduced, a method that represents the 4D feature space by deforming a canonical feature volume and learns a linear style transformation matrix on the feature volume in a data-driven fashion and outperforms existing methods in terms of visual quality and consistency.

Abstract

4D style transfer aims at transferring arbitrary visual style to the synthesized novel views of a dynamic 4D scene with varying viewpoints and times. Existing efforts on 3D style transfer can effectively combine the visual features of style images and neural radiance fields (NeRF) but fail to handle the 4D dynamic scenes limited by the static scene assumption. Consequently, we aim to handle the novel challenging problem of 4D style transfer for the first time, which further requires the consistency of stylized results on dynamic objects. In this paper, we introduce StyleDyRF, a method that represents the 4D feature space by deforming a canonical feature volume and learns a linear style transformation matrix on the feature volume in a data-driven fashion. To obtain the canonical feature volume, the rays at each time step are deformed with the geometric prior of a pre-trained dynamic NeRF to render the feature map under the supervision of pre-trained visual encoders. With the content and style cues in the canonical feature volume and the style image, we can learn the style transformation matrix from their covariance matrices with lightweight neural networks. The learned style transformation matrix can reflect a direct matching of feature covariance from the content volume to the given style pattern, in analogy with the optimization of the Gram matrix in traditional 2D neural style transfer. The experimental results show that our method not only renders 4D photorealistic style transfer results in a zero-shot manner but also outperforms existing methods in terms of visual quality and consistency.
Paper Structure (15 sections, 25 equations, 8 figures, 2 tables)

This paper contains 15 sections, 25 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Zero-shot 4D Style Transfer. Given a casually captured video containing dynamic objects, StyleDyRF can transfer the reference style to the 4D scene in a zero-shot manner. Taking one step further from the 3D multi-view consistency in style transfer, our model is capable of rendering novel views with temporal consistency in 4D scene.
  • Figure 2: The overview framework of StyleDyRF.
  • Figure 3: Comparison of our StyleDyRF with other methods on Nvidia Dataset. Our StyleDyRF produces better stylized novel views with temporal and multi-view consistency in the provided samples of dynamic scenes.
  • Figure 4: Qualitative results of our StyleDyRF on DAVIS dataset. Our StyleDyRF can render stylized novel views with realistic quality and 4D consistency at different times.
  • Figure 5: Ablation results of our proposed canonical feature volume. CFV can effectively model dynamic objects in 4D scenes, providing coherent stylization with the whole scene.
  • ...and 3 more figures