Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation
Taekyung Ki, Dongchan Min, Gyeongsu Chae
TL;DR
This work tackles 3D-aware portrait animation with cross-identity expression transfer by addressing appearance-expression entanglement. It introduces Export3D, which uses a Contrastive Learned Basis Scaling (CLeBS) to extract appearance-free expressions and a Hybrid Tri-plane Generator with Expression Adaptive Layer Normalization (EAdaLN) to inject driving expressions into a 3D-aware tri-plane, followed by differentiable volume rendering and super-resolution. The key contributions are the CLeBS framework for appearance-free expression, the end-to-end tri-plane-based generator, and extensive experiments showing reduced appearance swap and improved 3D-view consistency in both same-identity and cross-identity settings. This approach enables one-shot, high-fidelity, 3D-aware portrait animation driven by 3DMM parameters, with practical impact for realistic avatar animation and video synthesis while acknowledging limitations in background separation and eye gaze control.
Abstract
In this paper, we present Export3D, a one-shot 3D-aware portrait animation method that is able to control the facial expression and camera view of a given portrait image. To achieve this, we introduce a tri-plane generator with an effective expression conditioning method, which directly generates a tri-plane of 3D prior by transferring the expression parameter of 3DMM into the source image. The tri-plane is then decoded into the image of different view through a differentiable volume rendering. Existing portrait animation methods heavily rely on image warping to transfer the expression in the motion space, challenging on disentanglement of appearance and expression. In contrast, we propose a contrastive pre-training framework for appearance-free expression parameter, eliminating undesirable appearance swap when transferring a cross-identity expression. Extensive experiments show that our pre-training framework can learn the appearance-free expression representation hidden in 3DMM, and our model can generate 3D-aware expression controllable portrait images without appearance swap in the cross-identity manner.
