Table of Contents
Fetching ...

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

Xingyi Li, Zhiguo Cao, Yizheng Wu, Kewei Wang, Ke Xian, Zhe Wang, Guosheng Lin

TL;DR

S-DyRF tackles the problem of stylizing dynamic 3D scenes with limited stylized references by introducing temporal pseudo-references and a two-stage spatio-temporal transfer on dynamic neural radiance fields. The method builds on a pre-trained dynamic field $F_{ heta}$, renders a stylized reference $\, abla S_R^k$, and uses temporal pseudo-references to propagate style across time, followed by coarse feature-level transfer and a fine, temporally aware refinement via Temporal Reference Ray Registration. The optimization combines coarse and fine stylization losses with a temporal total-variation regularizer, operating on a 4D scene representation (space and time) to produce stylized novel views and times that remain semantically aligned with the reference. Experiments on synthetic and real data demonstrate improved perceptual similarity and temporal/spatial consistency over baselines (ARF*, Ref-NPR*, Texler) and reveal strong user preference, underscoring practical impact for controllable 3D art and design. Overall, S-DyRF enables flexible, temporally coherent stylization of dynamic 3D scenes with minimal reference input, expanding the frontier of reference-guided 3D stylization.

Abstract

Current 3D stylization methods often assume static scenes, which violates the dynamic nature of our real world. To address this limitation, we present S-DyRF, a reference-based spatio-temporal stylization method for dynamic neural radiance fields. However, stylizing dynamic 3D scenes is inherently challenging due to the limited availability of stylized reference images along the temporal axis. Our key insight lies in introducing additional temporal cues besides the provided reference. To this end, we generate temporal pseudo-references from the given stylized reference. These pseudo-references facilitate the propagation of style information from the reference to the entire dynamic 3D scene. For coarse style transfer, we enforce novel views and times to mimic the style details present in pseudo-references at the feature level. To preserve high-frequency details, we create a collection of stylized temporal pseudo-rays from temporal pseudo-references. These pseudo-rays serve as detailed and explicit stylization guidance for achieving fine style transfer. Experiments on both synthetic and real-world datasets demonstrate that our method yields plausible stylized results of space-time view synthesis on dynamic 3D scenes.

S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes

TL;DR

S-DyRF tackles the problem of stylizing dynamic 3D scenes with limited stylized references by introducing temporal pseudo-references and a two-stage spatio-temporal transfer on dynamic neural radiance fields. The method builds on a pre-trained dynamic field , renders a stylized reference , and uses temporal pseudo-references to propagate style across time, followed by coarse feature-level transfer and a fine, temporally aware refinement via Temporal Reference Ray Registration. The optimization combines coarse and fine stylization losses with a temporal total-variation regularizer, operating on a 4D scene representation (space and time) to produce stylized novel views and times that remain semantically aligned with the reference. Experiments on synthetic and real data demonstrate improved perceptual similarity and temporal/spatial consistency over baselines (ARF*, Ref-NPR*, Texler) and reveal strong user preference, underscoring practical impact for controllable 3D art and design. Overall, S-DyRF enables flexible, temporally coherent stylization of dynamic 3D scenes with minimal reference input, expanding the frontier of reference-guided 3D stylization.

Abstract

Current 3D stylization methods often assume static scenes, which violates the dynamic nature of our real world. To address this limitation, we present S-DyRF, a reference-based spatio-temporal stylization method for dynamic neural radiance fields. However, stylizing dynamic 3D scenes is inherently challenging due to the limited availability of stylized reference images along the temporal axis. Our key insight lies in introducing additional temporal cues besides the provided reference. To this end, we generate temporal pseudo-references from the given stylized reference. These pseudo-references facilitate the propagation of style information from the reference to the entire dynamic 3D scene. For coarse style transfer, we enforce novel views and times to mimic the style details present in pseudo-references at the feature level. To preserve high-frequency details, we create a collection of stylized temporal pseudo-rays from temporal pseudo-references. These pseudo-rays serve as detailed and explicit stylization guidance for achieving fine style transfer. Experiments on both synthetic and real-world datasets demonstrate that our method yields plausible stylized results of space-time view synthesis on dynamic 3D scenes.
Paper Structure (14 sections, 8 equations, 6 figures, 3 tables)

This paper contains 14 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Our method can stylize dynamic 3D scenes, maintaining semantic consistency with the given reference image across both spatial and temporal dimensions. Besides novel time synthesis (the leftmost one), our stylized dynamic radiance field can synthesize novel views (the second one) or perform space-time view synthesis (the third one) li20233dshen2023makeli2021neural. Furthermore, our method can also stylize synthetic objects (the rightmost one). We encourage readers to experience the animations by viewing them with Adobe Acrobat or KDE Okular.
  • Figure 2: An overview of our method. Given a pre-trained photorealistic dynamic radiance field, we first render a reference view at time $k$ from a specific reference camera. Following that, the reference view undergoes a 2D style transfer using an appropriate method, e.g., manual editing, NNST kolkin2022neural, or ControlNet zhang2023adding, to produce a stylized reference image. To propagate the style information from the stylized reference to other timestamps, we generate temporal pseudo-references and apply spatio-temporal style transfer to optimize our dynamic radiance field. Once this stylization is done, we can yield plausible stylized results of space-time view synthesis on dynamic 3D scenes.
  • Figure 3: Qualitative comparisons on real-world and synthetic datasets. We compare our method with ARF* zhang2022arf, Ref-NPR* zhang2023ref, and Texler et al. texler2020interactive. In each case, the upper left image represents the reference view, generated from the photorealistic dynamic radiance field, while the lower left image depicts its corresponding stylized version.
  • Figure 4: Multi-references. Our method is versatile and can also accept multiple references. We show that including additional references in spatial or temporal dimensions enriches the details and enhances the overall quality of the results.
  • Figure 5: Controllable stylization. Our method inherently facilitates controllable stylization. (a) Besides neural style transfer methods such as NNST kolkin2022neural, we can leverage ControlNet zhang2023adding to generate or edit the reference image, and subsequently apply this modified reference to shape our dynamic 3D scenes. (b) Furthermore, our method enables localized edits to the reference image, making it possible to finetune or alter specific aspects of the scenes.
  • ...and 1 more figures