Table of Contents
Fetching ...

GAS-NeRF: Geometry-Aware Stylization of Dynamic Radiance Fields

Nhat Phuong Anh Vu, Abhishek Saroha, Or Litany, Daniel Cremers

TL;DR

GAS-NeRF tackles the gap in 3D scene stylization by jointly optimizing geometry and appearance for dynamic radiance fields. It first transfers geometry from a style image using depth maps, then applies appearance stylization, leveraging a two-stage optimization, NNFM-based losses, and a linear gradient-scaling strategy to suppress near-camera artifacts. The approach builds on a Hexplane dynamic RF and uses depth maps from style images (via ZoeDepth) along with a VGG16-based feature extractor for NNFM losses, achieving improved depth and RGB fidelity and temporal coherence. Experimental results on real and synthetic datasets, plus a user study, demonstrate superior stylization quality and coherence, highlighting the practical potential for dynamic scene editing in applications like games and AR/VR.

Abstract

Current 3D stylization techniques primarily focus on static scenes, while our world is inherently dynamic, filled with moving objects and changing environments. Existing style transfer methods primarily target appearance -- such as color and texture transformation -- but often neglect the geometric characteristics of the style image, which are crucial for achieving a complete and coherent stylization effect. To overcome these shortcomings, we propose GAS-NeRF, a novel approach for joint appearance and geometry stylization in dynamic Radiance Fields. Our method leverages depth maps to extract and transfer geometric details into the radiance field, followed by appearance transfer. Experimental results on synthetic and real-world datasets demonstrate that our approach significantly enhances the stylization quality while maintaining temporal coherence in dynamic scenes.

GAS-NeRF: Geometry-Aware Stylization of Dynamic Radiance Fields

TL;DR

GAS-NeRF tackles the gap in 3D scene stylization by jointly optimizing geometry and appearance for dynamic radiance fields. It first transfers geometry from a style image using depth maps, then applies appearance stylization, leveraging a two-stage optimization, NNFM-based losses, and a linear gradient-scaling strategy to suppress near-camera artifacts. The approach builds on a Hexplane dynamic RF and uses depth maps from style images (via ZoeDepth) along with a VGG16-based feature extractor for NNFM losses, achieving improved depth and RGB fidelity and temporal coherence. Experimental results on real and synthetic datasets, plus a user study, demonstrate superior stylization quality and coherence, highlighting the practical potential for dynamic scene editing in applications like games and AR/VR.

Abstract

Current 3D stylization techniques primarily focus on static scenes, while our world is inherently dynamic, filled with moving objects and changing environments. Existing style transfer methods primarily target appearance -- such as color and texture transformation -- but often neglect the geometric characteristics of the style image, which are crucial for achieving a complete and coherent stylization effect. To overcome these shortcomings, we propose GAS-NeRF, a novel approach for joint appearance and geometry stylization in dynamic Radiance Fields. Our method leverages depth maps to extract and transfer geometric details into the radiance field, followed by appearance transfer. Experimental results on synthetic and real-world datasets demonstrate that our approach significantly enhances the stylization quality while maintaining temporal coherence in dynamic scenes.

Paper Structure

This paper contains 23 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison of stylized geometry before and after stylization. We show the effect of various different kinds of style images on the geometry of the underlying scene after our method. It can be seen that the underlying geometry, on the left, is adapted to match that of the depth of the input style images, which are geometrically different from each other, to provide a more comprehensive overview of the ability of our method.
  • Figure 2: Method Overview. We start by training a photorealistic radiance field using the given multi-view input video using Hexplanes cao2023hexplane. Given a style image $S_{rgb}$ and its corresponding depth map $S_{depth}$, we first freeze the appearance branch and modify the geometry(density) of the scene using $S_{depth}$, following which, we freeze the density and modify the appearance of the scene to obtain the stylized radiance field representing the scene.
  • Figure 3: Qualitative comparisons We provide the qualitative comparisons of our method against the baselines described in \ref{['sec:datasetsbaselines']}, namely ARF* arf, Ref-NPR* zhang2023ref,S-DyRF li2024sdyrf. In this particular example of the Nv3D dataset, we can see that our geometry consists of circular blobs, which are expected due to the style image being made of a large number of pearls. In the case of ARF*, even though the walls and the background appear circular, the geometry of the background regions is flat and rectangular, thereby giving rise to a mismatch. The color of ours is similar to ARF* since the color transfer backbone of both the methods is similar. Similarly, in the lego scene of the D-Nerf dataset, we can see that our method modifies the underlying geometry to resemble that of the skull pattern. Ref-NPR* is able to recreate the colors in the lego scene, while it fails to maintain details in the Nv3D case. S-DyRF in particular, suffers from temporal consistency, i.e. has visible flickering effects while navigating the scene along the temporal domain. We provide accompanying videos in the supplementary to give a more comprehensive overview and comparison for the same.
  • Figure 4: Effect of gradient scaling We demonstrate the effectiveness of the usage of linear gradient scaling, as explained in \ref{['sec:gradient_scaling_method']}. Gradient scaling helps reduce the induced floater artifacts that arise while we are transferring the geometry from the style image onto our pretrained radiance field.
  • Figure 5: Ablation of $\mathcal{L}_{tv}$ and $\mathcal{L}_{content}$. We show the effect of the two loss functions. The purpose of these two loss functions was to maintain the presence of small details after the transfer of geometry to the stylized dynamic scene. It can be seen here that without the two loss terms, the hand of the human in the scene breaks into two parts, while on the right, the hand is able to be a single unit while also having the desired geometry effects.