Table of Contents
Fetching ...

Multi-level Dynamic Style Transfer for NeRFs

Zesheng Li, Shuaibo Li, Wei Ma, Jianwei Guo, Hongbin Zha

TL;DR

The paper addresses the challenge of transferring artistic styles to 3D scenes represented by NeRFs while preserving multi-scale spatial structure. It introduces MDS-NeRF, a zero-shot framework built on a redesigned NeRF pipeline with a multi-level feature grid (MLFA) and a dynamic style injection (DSI) module, decoded via a multi-level cascade decoder (MLCD). Training occurs in two stages: Stage1 reconstructs the multi-level feature grid; Stage2 performs stylization by learning LIN and DSI, with the overall loss $L_g = L_f + L_r$. The method supports 2D and 3D style references, enables 3D-to-3D omni-view stylization and style mixing, and demonstrates superior content preservation and stylization quality compared to prior work, albeit with limitations tied to 3D reference quality and shape-changing tasks not addressed. These results suggest zero-shot, view-consistent 3D style transfer is feasible for NeRF-based scenes, with potential applications in AR/VR and design workflows.

Abstract

As the application of neural radiance fields (NeRFs) in various 3D vision tasks continues to expand, numerous NeRF-based style transfer techniques have been developed. However, existing methods typically integrate style statistics into the original NeRF pipeline, often leading to suboptimal results in both content preservation and artistic stylization. In this paper, we present multi-level dynamic style transfer for NeRFs (MDS-NeRF), a novel approach that reengineers the NeRF pipeline specifically for stylization and incorporates an innovative dynamic style injection module. Particularly, we propose a multi-level feature adaptor that helps generate a multi-level feature grid representation from the content radiance field, effectively capturing the multi-scale spatial structure of the scene. In addition, we present a dynamic style injection module that learns to extract relevant style features and adaptively integrates them into the content patterns. The stylized multi-level features are then transformed into the final stylized view through our proposed multi-level cascade decoder. Furthermore, we extend our 3D style transfer method to support omni-view style transfer using 3D style references. Extensive experiments demonstrate that MDS-NeRF achieves outstanding performance for 3D style transfer, preserving multi-scale spatial structures while effectively transferring stylistic characteristics.

Multi-level Dynamic Style Transfer for NeRFs

TL;DR

The paper addresses the challenge of transferring artistic styles to 3D scenes represented by NeRFs while preserving multi-scale spatial structure. It introduces MDS-NeRF, a zero-shot framework built on a redesigned NeRF pipeline with a multi-level feature grid (MLFA) and a dynamic style injection (DSI) module, decoded via a multi-level cascade decoder (MLCD). Training occurs in two stages: Stage1 reconstructs the multi-level feature grid; Stage2 performs stylization by learning LIN and DSI, with the overall loss . The method supports 2D and 3D style references, enables 3D-to-3D omni-view stylization and style mixing, and demonstrates superior content preservation and stylization quality compared to prior work, albeit with limitations tied to 3D reference quality and shape-changing tasks not addressed. These results suggest zero-shot, view-consistent 3D style transfer is feasible for NeRF-based scenes, with potential applications in AR/VR and design workflows.

Abstract

As the application of neural radiance fields (NeRFs) in various 3D vision tasks continues to expand, numerous NeRF-based style transfer techniques have been developed. However, existing methods typically integrate style statistics into the original NeRF pipeline, often leading to suboptimal results in both content preservation and artistic stylization. In this paper, we present multi-level dynamic style transfer for NeRFs (MDS-NeRF), a novel approach that reengineers the NeRF pipeline specifically for stylization and incorporates an innovative dynamic style injection module. Particularly, we propose a multi-level feature adaptor that helps generate a multi-level feature grid representation from the content radiance field, effectively capturing the multi-scale spatial structure of the scene. In addition, we present a dynamic style injection module that learns to extract relevant style features and adaptively integrates them into the content patterns. The stylized multi-level features are then transformed into the final stylized view through our proposed multi-level cascade decoder. Furthermore, we extend our 3D style transfer method to support omni-view style transfer using 3D style references. Extensive experiments demonstrate that MDS-NeRF achieves outstanding performance for 3D style transfer, preserving multi-scale spatial structures while effectively transferring stylistic characteristics.

Paper Structure

This paper contains 26 sections, 11 equations, 12 figures.

Figures (12)

  • Figure 1: Overview of MDS-NeRF. MDS-NeRF features a multi-level feature grid representation along with a dynamic style injection (DSI) module. For a given view to be rendered, we map the basis point features $\{\mathcal{P}_i\}$ sampled along each ray into multi-level pixel features using the multi-level feature adaptor (MLFA) and volume rendering, obtaining the multi-level content features, denoted $F^\ell,{\ell}\in\{{\ell}_l,{\ell}_m,{\ell}_h\}$. Given a style reference, which could be in either 2D or 3D form, the proposed DSI selects the preferred style characteristics and injects them into the content features. Finally, the altered multi-level content features are decoded by the multi-level cascade decoder (MLCD) to produce a stylized RGB view. Stage1 and Stage2 refer to the two training stages: the multi-level feature grid reconstruction stage and the stylization stage, respectively.
  • Figure 2: MLFA and MLCD. The multi-level feature adaptor (MLFA) transforms basic point features into multi-level point features and employs learnable instance normalization (LIN) to suppress the styles in the content features. LIN (indicated with *) is applied during stylization. The multi-level cascade decoder (MLCD) decodes the multi-level content features in the first training stage, or the stylized content features during stylization, into an RGB image in a cascaded manner, progressively integrating information from different feature levels.
  • Figure 3: NeRF rendered features and VGG extracted features. The rendered features exhibit discrepancies compared to the VGG extracted features.
  • Figure 4: Dynamic style injection (DSI). DSI consists of two parts: (i) generating a set of weights and biases from the extracted style features, and (ii) applying group convolution to the rendered features using the generated parameters. The structure of the weight and bias generator is shown at the bottom right, where the upper branch extracts spatial information from the style features, and the lower branch extracts channel information. C denotes a 2D convolution layer, while AP refers to an adaptive average pooling layer.
  • Figure 5: Qualitative comparison. We present 2D-to-3D style transfer results obtained by LSNV, StyleRF, StylizedNeRF and our proposed MDS-NeRF. Using the proposed multi-level feature grid and dynamic style injection module, MDS-NeRF produces results that preserve multi-level scene structures while effectively incorporating rich style characteristics.
  • ...and 7 more figures