Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

Dafei Qin; Hongyang Lin; Qixuan Zhang; Kaichun Qiao; Longwen Zhang; Zijun Zhao; Jun Saito; Jingyi Yu; Lan Xu; Taku Komura

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura

TL;DR

This work introduces GauFace, a structured Gaussian Splatting representation that integrates CG-ready facial assets with a 3D Gaussian basis to enable efficient, relightable, and interactive rendering. It couples GauFace with TransGS, a patch-based diffusion-transformer translator that converts PBR facial assets into GauFace in seconds, guided by UV positional encoding and rich conditioning from textures, geometry, and lighting. The approach enables real-time rendering on mobile devices and cross-platform platforms, while preserving CG-like editing capabilities through explicit geometry and shading control. Extensive experiments, ablations, and user studies demonstrate that TransGS delivers near-offline quality rendering compared to traditional renderers and outperforms contemporary neural-volume methods under practical time budgets, with GauFace providing robust, editable, and deformation-aware representations for facial avatars.

Abstract

We propose GauFace, a novel Gaussian Splatting representation, tailored for efficient animation and rendering of physically-based facial assets. Leveraging strong geometric priors and constrained optimization, GauFace ensures a neat and structured Gaussian representation, delivering high fidelity and real-time facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translates physically-based facial assets into the corresponding GauFace representations. Specifically, we adopt a patch-based pipeline to handle the vast number of Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme with UV positional encoding to ensure the throughput and rendering quality of GauFace assets generated by our TransGS. Once trained, TransGS can instantly translate facial assets with lighting conditions to GauFace representation, With the rich conditioning modalities, it also enables editing and animation capabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditional offline and online renderers, as well as recent neural rendering methods, which demonstrate the superior performance of our approach for facial asset rendering. We also showcase diverse immersive applications of facial assets using our TransGS approach and GauFace representation, across various platforms like PCs, phones and even VR headsets.

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

TL;DR

Abstract

Paper Structure (50 sections, 15 equations, 13 figures, 5 tables)

This paper contains 50 sections, 15 equations, 13 figures, 5 tables.

Introduction
Related Works
Face Rendering
Traditional Rendering Techniques
Volume Rendering
3DGS Variants
Face Generation
PBR Facial Asset Generation
GAN / NeRF Based Generation.
GS-Based Generation
Post-Editing
NeRF editing
3DGS Editing
Preliminary
Physically-Based Rendering Facial Assets
...and 35 more sections

Figures (13)

Figure 1: Overview. We present two methods for obtaining relightable dynamic Gaussian facial assets. The first method (Sec. \ref{['sec:gs_face']}) render high-quality multi-view images and optimize the GauFace representation. The second method (Sec. \ref{['sec:gs_gen']}), which we introduce as TransGS , directly generates GauFace assets from textures and models in approximately 5 seconds.
Figure 2: PBR facial assets and GauFace representation.Left: We collect 143 facial assets under 134 lighting conditions, with a total of 1,023 combinations. Middle: For each combination, we render 1,071 frames under 153 different expressions with random camera positions. Right: Our GauFace asset defines the center of Gaussians on the UV map consistent across different identities and introduces dynamic shadow vectors to disentangle the deformation-dependent and deformation-agnostic shading effects.
Figure 3: Deferred Pruning.Upper: (a) GauFace and it's UV space base color visualization before and after pruning. (b) Without pruning, 22k points. (c) Pruned with opacity $\sigma \le 0.1$, 11k points. (d) Difference between (b) and (c). Lower: PSNR vs. Number of Gaussians Curve. PSNR is calculated on 306 testing images with different expressions and camera positions. Gaussian points are pruned with different opacity value.
Figure 4: TransGS architecture. We condition TransGS on the image textures $I$, geometry code $\boldsymbol{v}_G$ and HDRI map $L$, to generate the GauFace asset $A$ in a patch-based manner. Left: during training, a random global offset $\boldsymbol{q}$ is sampled, and the corresponding Image patch $I_q$ and GauFace patch $A_q$ are fed to the diffusion transformer. Right: at inference, the full GauFace asset can be synthesized in a single pass.
Figure 5: Difference between attributes of two GauFace optimized from the same training images, visualized by plotting attributes to the UV sampling position of Gaussian points. $\Delta_{\boldsymbol{c}}, \Delta_\delta, \Delta_{\boldsymbol{s}}$ are the difference of SH base color, opacity, and specular intensity under two different runs, respectively.
...and 8 more figures

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

TL;DR

Abstract

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering

Authors

TL;DR

Abstract

Table of Contents

Figures (13)