Table of Contents
Fetching ...

G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles

Adil Meric, Umut Kocasari, Matthias Nießner, Barbara Roessle

TL;DR

This work addresses the bottleneck of per-scene optimization in NeRF-based 3D style transfer by introducing a generalizable NeRF Transformer augmented with a hypernetwork that conditions a style latent $z_s$ from a Style-VAE. The model renders view-consistent stylized novel views for unseen scenes and styles at inference time, using a loss that combines $L_{content}$, $L_{style}$, and a novel multi-view consistency term $L_{consistency}$ based on optical flow between views: $L_{total} = L_{content} + w_s L_{style} + w_c L_{consistency}$. Key contributions include the first generalizable 3D style transfer across scenes and styles, a flow-based consistency loss to preserve cross-view fidelity, and an efficient, on-the-fly stylization pipeline that outperforms per-scene methods in both quality and speed. The results demonstrate high-quality stylizations with strong multi-view consistency, enabling practical, scene-agnostic 3D style transfer without scene-specific retraining, with potential impact on real-time 3D content creation and editing.

Abstract

Neural Radiance Fields (NeRF) have emerged as a powerful tool for creating highly detailed and photorealistic scenes. Existing methods for NeRF-based 3D style transfer need extensive per-scene optimization for single or multiple styles, limiting the applicability and efficiency of 3D style transfer. In this work, we overcome the limitations of existing methods by rendering stylized novel views from a NeRF without the need for per-scene or per-style optimization. To this end, we take advantage of a generalizable NeRF model to facilitate style transfer in 3D, thereby enabling the use of a single learned model across various scenes. By incorporating a hypernetwork into a generalizable NeRF, our approach enables on-the-fly generation of stylized novel views. Moreover, we introduce a novel flow-based multi-view consistency loss to preserve consistency across multiple views. We evaluate our method across various scenes and artistic styles and show its performance in generating high-quality and multi-view consistent stylized images without the need for a scene-specific implicit model. Our findings demonstrate that this approach not only achieves a good visual quality comparable to that of per-scene methods but also significantly enhances efficiency and applicability, marking a notable advancement in the field of 3D style transfer.

G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles

TL;DR

This work addresses the bottleneck of per-scene optimization in NeRF-based 3D style transfer by introducing a generalizable NeRF Transformer augmented with a hypernetwork that conditions a style latent from a Style-VAE. The model renders view-consistent stylized novel views for unseen scenes and styles at inference time, using a loss that combines , , and a novel multi-view consistency term based on optical flow between views: . Key contributions include the first generalizable 3D style transfer across scenes and styles, a flow-based consistency loss to preserve cross-view fidelity, and an efficient, on-the-fly stylization pipeline that outperforms per-scene methods in both quality and speed. The results demonstrate high-quality stylizations with strong multi-view consistency, enabling practical, scene-agnostic 3D style transfer without scene-specific retraining, with potential impact on real-time 3D content creation and editing.

Abstract

Neural Radiance Fields (NeRF) have emerged as a powerful tool for creating highly detailed and photorealistic scenes. Existing methods for NeRF-based 3D style transfer need extensive per-scene optimization for single or multiple styles, limiting the applicability and efficiency of 3D style transfer. In this work, we overcome the limitations of existing methods by rendering stylized novel views from a NeRF without the need for per-scene or per-style optimization. To this end, we take advantage of a generalizable NeRF model to facilitate style transfer in 3D, thereby enabling the use of a single learned model across various scenes. By incorporating a hypernetwork into a generalizable NeRF, our approach enables on-the-fly generation of stylized novel views. Moreover, we introduce a novel flow-based multi-view consistency loss to preserve consistency across multiple views. We evaluate our method across various scenes and artistic styles and show its performance in generating high-quality and multi-view consistent stylized images without the need for a scene-specific implicit model. Our findings demonstrate that this approach not only achieves a good visual quality comparable to that of per-scene methods but also significantly enhances efficiency and applicability, marking a notable advancement in the field of 3D style transfer.
Paper Structure (29 sections, 4 equations, 8 figures, 4 tables)

This paper contains 29 sections, 4 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Generalizable 3D Style Transfer. Given a set of source views and a style image, our method renders view-consistent, stylized novel views without any per-scenew or per-style optimization.
  • Figure 2: Framework. We utilize a hypernetwork to apply a style transformation to the features of a generalizable transformer-based NeRF. The hypernetwork takes a style latent vector $z_s$ as input and outputs weights and biases of an intermediate MLP, which stylizes the aggregated ray features. This operation is repeated for each ray in the image to produce a high quality stylized image. We calculate the optical flow between source views and minimize the difference between corresponding pixels in stylized images.
  • Figure 3: Results. Our method captures the style and preserves view-consistency. The output images produced from different viewpoints are geometrically consistent and capture the stylistic details of the given style image.
  • Figure 4: Visual Comparison with Other Methods. Hyper chiang2022stylizing produces blurry results with artifacts, StyleRF Liu_2023_CVPR captures the style and preserves the geometry in a more consistent way. Our method successfully captures the style of a given style image while preserving the geometric details.
  • Figure 5: Comparison of our method with StyleRF from different views. Our method captures the style and preserves the geometry in different novel views. StyleRF cannot capture local geometric details, therefore, it struggles to preserve multi-view consistency.
  • ...and 3 more figures