DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis
Kaijun Deng, Dezhi Zheng, Jindong Xie, Jinbao Wang, Weicheng Xie, Linlin Shen, Siyang Song
TL;DR
DEGSTalk tackles hair-preserving talking-face synthesis by integrating Deformable Pre-Embedding Gaussian Fields within a 3D Gaussian Splatting framework and a Dynamic Hair-Preserving Rendering pipeline. It predicts per-Gaussian deformations from audio features and implicit 3DMM coefficients through a tri-plane hash encoder and an MLP, producing deformed Gaussian parameters for rendering. A hair-aware fusion strategy preserves long-hair dynamics while maintaining facial realism, trained via a three-stage optimization that jointly enforces geometric fidelity and perceptual quality. On six portrait videos, DEGSTalk achieves state-of-the-art fidelity and near real-time rendering, demonstrating strong performance and hair preservation, with some noisy primitives remaining as future work.
Abstract
Accurately synthesizing talking face videos and capturing fine facial features for individuals with long hair presents a significant challenge. To tackle these challenges in existing methods, we propose a decomposed per-embedding Gaussian fields (DEGSTalk), a 3D Gaussian Splatting (3DGS)-based talking face synthesis method for generating realistic talking faces with long hairs. Our DEGSTalk employs Deformable Pre-Embedding Gaussian Fields, which dynamically adjust pre-embedding Gaussian primitives using implicit expression coefficients. This enables precise capture of dynamic facial regions and subtle expressions. Additionally, we propose a Dynamic Hair-Preserving Portrait Rendering technique to enhance the realism of long hair motions in the synthesized videos. Results show that DEGSTalk achieves improved realism and synthesis quality compared to existing approaches, particularly in handling complex facial dynamics and hair preservation. Our code will be publicly available at https://github.com/CVI-SZU/DEGSTalk.
