Table of Contents
Fetching ...

An Implicit Physical Face Model Driven by Expression and Style

Lingchen Yang, Gaspard Zoss, Prashanth Chandran, Paulo Gotardo, Markus Gross, Barbara Solenthaler, Eftychios Sifakis, Derek Bradley

TL;DR

This work presents a new face model, based on a data-driven implicit neural physics model, that can be driven by both expression and style separately, and is capable of synthesizing physical effects, such as collision handling, setting this method apart from conventional approaches.

Abstract

3D facial animation is often produced by manipulating facial deformation models (or rigs), that are traditionally parameterized by expression controls. A key component that is usually overlooked is expression 'style', as in, how a particular expression is performed. Although it is common to define a semantic basis of expressions that characters can perform, most characters perform each expression in their own style. To date, style is usually entangled with the expression, and it is not possible to transfer the style of one character to another when considering facial animation. We present a new face model, based on a data-driven implicit neural physics model, that can be driven by both expression and style separately. At the core, we present a framework for learning implicit physics-based actuations for multiple subjects simultaneously, trained on a few arbitrary performance capture sequences from a small set of identities. Once trained, our method allows generalized physics-based facial animation for any of the trained identities, extending to unseen performances. Furthermore, it grants control over the animation style, enabling style transfer from one character to another or blending styles of different characters. Lastly, as a physics-based model, it is capable of synthesizing physical effects, such as collision handling, setting our method apart from conventional approaches.

An Implicit Physical Face Model Driven by Expression and Style

TL;DR

This work presents a new face model, based on a data-driven implicit neural physics model, that can be driven by both expression and style separately, and is capable of synthesizing physical effects, such as collision handling, setting this method apart from conventional approaches.

Abstract

3D facial animation is often produced by manipulating facial deformation models (or rigs), that are traditionally parameterized by expression controls. A key component that is usually overlooked is expression 'style', as in, how a particular expression is performed. Although it is common to define a semantic basis of expressions that characters can perform, most characters perform each expression in their own style. To date, style is usually entangled with the expression, and it is not possible to transfer the style of one character to another when considering facial animation. We present a new face model, based on a data-driven implicit neural physics model, that can be driven by both expression and style separately. At the core, we present a framework for learning implicit physics-based actuations for multiple subjects simultaneously, trained on a few arbitrary performance capture sequences from a small set of identities. Once trained, our method allows generalized physics-based facial animation for any of the trained identities, extending to unseen performances. Furthermore, it grants control over the animation style, enabling style transfer from one character to another or blending styles of different characters. Lastly, as a physics-based model, it is capable of synthesizing physical effects, such as collision handling, setting our method apart from conventional approaches.
Paper Structure (32 sections, 20 equations, 21 figures, 1 table)

This paper contains 32 sections, 20 equations, 21 figures, 1 table.

Figures (21)

  • Figure 1: We illustrate our pipeline, consisting of the creation of a canonical space for training, our style- and expression-conditioned network for generating multi-identity physical constraints, and the contact-aware differentiable simulation.
  • Figure 2: Qualitative evaluation of our model components on two retargeting examples. The input is the unseen blendweight (BW) vector of a source actor (col 1), illustrated with simple blending on the target identity's blendshape model (col 2). Among the model variants, our proposed model (Model-CSWL) achieves the most natural results with fewer artifacts.
  • Figure 3: Our model can disentangle identity style vs. expression, as shown by a T-SNE plot of actuation modulation codes $\mathbf{m}$. With Lipschitz regularization (right) achieves better disentanglement than without (left).
  • Figure 4: We compare our model trained on a single identity (S.) and multi-identities (M.) to the model of Yang et al. yang2022implicit, with both SIREN and GeLU activations. Our multi-identity model with GeLU exhibits less artifacts.
  • Figure 5: Comparison between single-identity and multi-identity models on a retargeting example (left), and visualization of reconstruction errors (right). Our multi-identity model achieves lower errors and more natural shapes.
  • ...and 16 more figures