Table of Contents
Fetching ...

FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields

GeonU Kim, Kim Youwang, Tae-Hyun Oh

TL;DR

FPRF addresses the inefficiency of prior 3D PST methods by delivering a feed-forward, single-stage stylization pipeline for large-scale neural radiance fields. It introduces a stylizable radiance field composed of a scene content field and a scene semantic field, and leverages AdaIN for fast style transfer using arbitrary references. A compact style dictionary built from semantic clustering and semantic-aware Local AdaIN enables effective multi-reference stylization while maintaining multi-view consistency. Experiments on large SF Mission Bay and LLFF scenes demonstrate photorealistic stylization and robustness to viewpoint changes, highlighting FPRF's practical potential for XR content creation and data augmentation.

Abstract

We present FPRF, a feed-forward photorealistic style transfer method for large-scale 3D neural radiance fields. FPRF stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view appearance consistency. Prior arts required tedious per-style/-scene optimization and were limited to small-scale 3D scenes. FPRF efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D neural radiance field, which inherits AdaIN's feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPRF supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPRF also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPRF achieves favorable photorealistic quality 3D scene stylization for large-scale scenes with diverse reference images. Project page: https://kim-geonu.github.io/FPRF/

FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields

TL;DR

FPRF addresses the inefficiency of prior 3D PST methods by delivering a feed-forward, single-stage stylization pipeline for large-scale neural radiance fields. It introduces a stylizable radiance field composed of a scene content field and a scene semantic field, and leverages AdaIN for fast style transfer using arbitrary references. A compact style dictionary built from semantic clustering and semantic-aware Local AdaIN enables effective multi-reference stylization while maintaining multi-view consistency. Experiments on large SF Mission Bay and LLFF scenes demonstrate photorealistic stylization and robustness to viewpoint changes, highlighting FPRF's practical potential for XR content creation and data augmentation.

Abstract

We present FPRF, a feed-forward photorealistic style transfer method for large-scale 3D neural radiance fields. FPRF stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view appearance consistency. Prior arts required tedious per-style/-scene optimization and were limited to small-scale 3D scenes. FPRF efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D neural radiance field, which inherits AdaIN's feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPRF supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPRF also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPRF achieves favorable photorealistic quality 3D scene stylization for large-scale scenes with diverse reference images. Project page: https://kim-geonu.github.io/FPRF/
Paper Structure (20 sections, 6 equations, 6 figures, 1 table)

This paper contains 20 sections, 6 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: FPRF training stage. Given a set of scene images and corresponding VGG and DINO features, FPRF learns the stylizable radiance field. Stylizable radiance field embeds the geometry, radiance field, and semantic features of the scene. Note that FPRF only needs the original scene images during training, not the stylized images, while it can take arbitrary style images in the stylization stage.
  • Figure 2: FPRF stylization stage. Given the optimized stylizable radiance field and the set of arbitrary reference images, we stylize the large-scale 3D scene via our novel semantic-aware local AdaIN. We compose a style dictionary consisting of local semantic codes and local style codes pairs extracted from the clustered reference images. Using semantic features from the stylizable radiance as a query, we find the corresponding local semantic features and retrieve the paired local style codes. Using the retrieved semantic-style code pairs, we perform semantic matching and local AdaIN, then finally render the stylized colors.
  • Figure 3: Effects of guided filtering on semantic features.[Top] Given the trained 3D scene and the reference image (left), we visualize the learned stylizable radiance field without (mid) / with (right) guided filtering. The learned semantic features are much sharper when guided filtering is applied. [Bottom] The stylizable radiance field shows degraded stylization results if learned without guided filtering, e.g., blurry boundaries (left), higher stylization quality when learned with guided filtering (right).
  • Figure 4: Multi-view appearance consistency on the San Francisco Mission Bay dataset tancik2022blocknerf. FPRF preserves multi-view appearance consistency even in extreme viewpoint change, while 2D PST methods (wu2022ccpl; chiu2022photowct2) produce inconsistent colors of the same building as the viewpoint changes.
  • Figure 6: Multi-reference style transfer. FPRF stylizes the 3D radiance field with multiple reference images. Each heatmap shows the similarity between the semantic features of a highlighted patch and the reference image. Our model comprehends the semantic relationship of a large-scale 3D scene and matches the scene with the reference images.
  • ...and 1 more figures