Table of Contents
Fetching ...

Instant Photorealistic Neural Radiance Fields Stylization

Shaoxu Li, Ye Pan

TL;DR

Instant Neural Radiance Fields Stylization addresses fast, coherent 3D scene stylization for NeRF representations by splitting the hash-encoded position embedding into content and style branches and applying AdaIN on voxel-grid features at inference. The method trains two scenes in under 10 minutes and supports style transfer not only from single images but between image sets of scenes, enabling consistent stylization across viewing angles. It leverages the Instant-NGP hash encoder for speed, shared color decoding, and two-branch encoders to facilitate set-to-set style transfer without extra training. Experiments on NeRF-Synthetic and LLFF demonstrate improved geometric fidelity and stylization consistency, highlighting the approach as a practical, scalable solution for 3D scene stylization.

Abstract

We present Instant Neural Radiance Fields Stylization, a novel approach for multi-view image stylization for the 3D scene. Our approach models a neural radiance field based on neural graphics primitives, which use a hash table-based position encoder for position embedding. We split the position encoder into two parts, the content and style sub-branches, and train the network for normal novel view image synthesis with the content and style targets. In the inference stage, we execute AdaIN to the output features of the position encoder, with content and style voxel grid features as reference. With the adjusted features, the stylization of novel view images could be obtained. Our method extends the style target from style images to image sets of scenes and does not require additional network training for stylization. Given a set of images of 3D scenes and a style target(a style image or another set of 3D scenes), our method can generate stylized novel views with a consistent appearance at various view angles in less than 10 minutes on modern GPU hardware. Extensive experimental results demonstrate the validity and superiority of our method.

Instant Photorealistic Neural Radiance Fields Stylization

TL;DR

Instant Neural Radiance Fields Stylization addresses fast, coherent 3D scene stylization for NeRF representations by splitting the hash-encoded position embedding into content and style branches and applying AdaIN on voxel-grid features at inference. The method trains two scenes in under 10 minutes and supports style transfer not only from single images but between image sets of scenes, enabling consistent stylization across viewing angles. It leverages the Instant-NGP hash encoder for speed, shared color decoding, and two-branch encoders to facilitate set-to-set style transfer without extra training. Experiments on NeRF-Synthetic and LLFF demonstrate improved geometric fidelity and stylization consistency, highlighting the approach as a practical, scalable solution for 3D scene stylization.

Abstract

We present Instant Neural Radiance Fields Stylization, a novel approach for multi-view image stylization for the 3D scene. Our approach models a neural radiance field based on neural graphics primitives, which use a hash table-based position encoder for position embedding. We split the position encoder into two parts, the content and style sub-branches, and train the network for normal novel view image synthesis with the content and style targets. In the inference stage, we execute AdaIN to the output features of the position encoder, with content and style voxel grid features as reference. With the adjusted features, the stylization of novel view images could be obtained. Our method extends the style target from style images to image sets of scenes and does not require additional network training for stylization. Given a set of images of 3D scenes and a style target(a style image or another set of 3D scenes), our method can generate stylized novel views with a consistent appearance at various view angles in less than 10 minutes on modern GPU hardware. Extensive experimental results demonstrate the validity and superiority of our method.
Paper Structure (11 sections, 4 equations, 11 figures, 2 tables)

This paper contains 11 sections, 4 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Stylization results of our method. Given a set of images of 3D scenes (a) and a style target (b) (a style image or another set of images of 3D scenes), our method is capable of generating stylized novel views (c) with a consistent appearance at various view angles.
  • Figure 2: The architecture of our method. Our approach uses a hash encoder for position embedding. We use two sub-position encoders for content and style images in the training process. Our style target could be a style image or another set of images of 3D scenes. For a style image, we treat it as if it is placed in the centre of 3D space. The two sub-position encoders share a $MLP^{RGB}$ for color calculation. The hash encoder promises the output of $MLP^{Density}$ as features, and shared $MLP^{RGB}$ promises the mixing of the features. We calculate the content and style feature for voxel grid positions in the inference stage. Then, AdaIN is executed for stylization, with these features for content and style mean and std parameters.
  • Figure 3: 2D style image and corresponding 3D colored voxels with different views.
  • Figure 4: Qualitative comparisons with artistic style images. We compare the stylization results of 3 scenes on the NeRF-Synthetic dataset. Our method stylizes scenes with more precise geometry and competitive stylization quality.
  • Figure 5: Qualitative comparisons with photorealistic style images. We compare the stylization results of 3 scenes on the LLFF dataset. Our method stylizes scenes with more precise geometry and competitive stylization quality.
  • ...and 6 more figures