Table of Contents
Fetching ...

FPGS: Feed-Forward Semantic-aware Photorealistic Style Transfer of Large-Scale Gaussian Splatting

GeonU Kim, Kim Youwang, Lee Hyoseok, Tae-Hyun Oh

TL;DR

FPGS tackles photorealistic style transfer for large-scale radiance fields represented by Gaussian Splatting, enabling stylization from multiple arbitrary references while preserving multi-view consistency and real-time rendering. It introduces a feed-forward, AdaIN-inspired style decomposition on a 3D feature field, with semantic correspondence and local AdaIN to support multi-reference stylization. Key components include a generalizable MLP color decoder and a scene-specific semantic feature autoencoder, plus scribble-based interactive style transfer via semantic-aware weighting of styles. Experiments demonstrate favorable photorealistic stylization for large-scale static and dynamic scenes with diverse references, maintaining cross-view consistency and efficient performance against state-of-the-art baselines.

Abstract

We present FPGS, a feed-forward photorealistic style transfer method of large-scale radiance fields represented by Gaussian Splatting. FPGS, stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view consistency and real-time rendering speed of 3D Gaussians. Prior arts required tedious per-style optimization or time-consuming per-scene training stage and were limited to small-scale 3D scenes. FPGS efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D feature field, which inherits AdaIN's feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPGS supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPGS also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPGS achieves favorable photorealistic quality scene stylization for large-scale static and dynamic 3D scenes with diverse reference images. Project page: https://kim-geonu.github.io/FPGS/

FPGS: Feed-Forward Semantic-aware Photorealistic Style Transfer of Large-Scale Gaussian Splatting

TL;DR

FPGS tackles photorealistic style transfer for large-scale radiance fields represented by Gaussian Splatting, enabling stylization from multiple arbitrary references while preserving multi-view consistency and real-time rendering. It introduces a feed-forward, AdaIN-inspired style decomposition on a 3D feature field, with semantic correspondence and local AdaIN to support multi-reference stylization. Key components include a generalizable MLP color decoder and a scene-specific semantic feature autoencoder, plus scribble-based interactive style transfer via semantic-aware weighting of styles. Experiments demonstrate favorable photorealistic stylization for large-scale static and dynamic scenes with diverse references, maintaining cross-view consistency and efficient performance against state-of-the-art baselines.

Abstract

We present FPGS, a feed-forward photorealistic style transfer method of large-scale radiance fields represented by Gaussian Splatting. FPGS, stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view consistency and real-time rendering speed of 3D Gaussians. Prior arts required tedious per-style optimization or time-consuming per-scene training stage and were limited to small-scale 3D scenes. FPGS efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D feature field, which inherits AdaIN's feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPGS supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPGS also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPGS achieves favorable photorealistic quality scene stylization for large-scale static and dynamic 3D scenes with diverse reference images. Project page: https://kim-geonu.github.io/FPGS/

Paper Structure

This paper contains 9 sections, 5 figures.

Figures (5)

  • Figure S1: Stylization results with correct semantic matchings and incorrect semantic matchings.[Left] Original 3D scene. [Mid] Stylization results with correct semantic matchings. [Right] Failure cases of our methods with incorrect semantic matchings in extreme cases.
  • Figure S2: Training pipeline of the semantic feature autoencoder.$\textbf{F}_\text{DINO}(\textbf{I})$ denotes the feature maps extracted from the training images $\textbf{I}$ with DINO and $\mathcal{L_\text{AE}}$ denotes the reconstruction loss (see Eq. (12)).
  • Figure S3: Additional qualitative results on LLFF dataset mildenhall2019local. Compared to the UPST-NeRF chen2024upst, our methods accurately reflect the diverse color of the reference image.
  • Figure S4: Additional qualitative results on unbounded scenes. Style transfer results on the San Francisco Mission Bay dataset tancik2022blocknerf (top 2), on the Tank and Temples dataset knapitsch2017tanks (mid 2), and on the Mip-NeRF 360 dataset barron2022mip (bottom 2).
  • Figure S5: Additional qualitative results on 4D scenes Style transfer results on the DyNeRF dataset li2022neural (top) and on the KITTI-360 dataset liao2022kitti (bottom).