Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

Xukun Zhou; Fengxin Li; Ziqiao Peng; Kejian Wu; Jun He; Biao Qin; Zhaoxin Fan; Hongyan Liu

Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

Xukun Zhou, Fengxin Li, Ziqiao Peng, Kejian Wu, Jun He, Biao Qin, Zhaoxin Fan, Hongyan Liu

TL;DR

MetaFace tackles the challenge of adapting audio-driven 3D talking face animation to varied speaking styles with minimal data. It introduces a meta-learning-based architecture comprising Robust Meta Initialization Stage, Dynamic Relation Mining Neural Process, and a Low-rank Matrix Memory Reduction mechanism to enable fast, memory-efficient personalization. Across VOCASet and BIWI, MetaFace achieves state-of-the-art performance in both facial motion and lip synchronization, with strong ablation evidence showing contributions from each component. The work offers a practical path toward personalized, high-fidelity avatars for live streaming and AR applications, reducing data requirements and computational load for on-device adaptation.

Abstract

Audio-driven 3D face animation is increasingly vital in live streaming and augmented reality applications. While remarkable progress has been observed, most existing approaches are designed for specific individuals with predefined speaking styles, thus neglecting the adaptability to varied speaking styles. To address this limitation, this paper introduces MetaFace, a novel methodology meticulously crafted for speaking style adaptation. Grounded in the novel concept of meta-learning, MetaFace is composed of several key components: the Robust Meta Initialization Stage (RMIS) for fundamental speaking style adaptation, the Dynamic Relation Mining Neural Process (DRMN) for forging connections between observed and unobserved speaking styles, and the Low-rank Matrix Memory Reduction Approach to enhance the efficiency of model optimization as well as learning style details. Leveraging these novel designs, MetaFace not only significantly outperforms robust existing baselines but also establishes a new state-of-the-art, as substantiated by our experimental results.

Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

TL;DR

Abstract

Paper Structure (20 sections, 14 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 14 equations, 3 figures, 4 tables, 1 algorithm.

Abstract
Introduction
Related Work
Lip Synchronization in 3D Face Animation
Speaking Style Adaptation in 3D Face Animation
Meta-learning and Parameter Adaptation
Method
Overview
Robust Meta Initialization Stage
Low-rank Matrix Memory Reduction Approach
Dynamic Relation Mining Neural Process
Loss Function
Experiments
Experimental details
Quantitative Evaluation
...and 5 more sections

Figures (3)

Figure 1: Difference between speaking style adaptation mode of existing methods and MetaFace.
Figure 2: Overall Framework of MetaFace.
Figure 3: Visual comparisons of facial movement by different methods on VOCA-Test (left) and BIWI-Test-A (right).

Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

TL;DR

Abstract

Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)