Table of Contents
Fetching ...

Compressed Skinning for Facial Blendshapes

Ladislav Kavan, John Doublestein, Martin Prazak, Matthew Cioffi, Doug Roble

TL;DR

This work tackles real-time on-device facial animation with large numbers of blendshapes by converting delta-blendshape data into a compressed linear blend skinning (LBS) form. It introduces a first-order optimization approach that combines the Adam optimizer with projection steps to enforce non-negativity, unity, and sparsity in both the weight matrix and transformation components, producing a sparse skinning decomposition called compressed skinning. Key contributions include an explicit transformation from blendshape weights to LBS transforms, a sparsity-enhanced optimization framework, and a PyTorch implementation enabling flexible loss functions and constraints; an HD setting demonstrates substantially improved detail. Results show comparable or better fitting accuracy than Dem Bones while offering substantial memory savings (5–7×) and run-time speedups (2–3×) on mobile hardware, with additional gains in detail when using higher-norm formulations. The approach enables scalable, on-device facial animation for rigs with hundreds of blendshapes, albeit with longer pre-processing and a black-box rig evaluation assumption that may be refined in future work by integrating learned rigs or weighted vertex importance.

Abstract

We present a new method to bake classical facial animation blendshapes into a fast linear blend skinning representation. Previous work explored skinning decomposition methods that approximate general animated meshes using a dense set of bone transformations; these optimizers typically alternate between optimizing for the bone transformations and the skinning weights.We depart from this alternating scheme and propose a new approach based on proximal algorithms, which effectively means adding a projection step to the popular Adam optimizer. This approach is very flexible and allows us to quickly experiment with various additional constraints and/or loss functions. Specifically, we depart from the classical skinning paradigms and restrict the transformation coefficients to contain only about 10% non-zeros, while achieving similar accuracy and visual quality as the state-of-the-art. The sparse storage enables our method to deliver significant savings in terms of both memory and run-time speed. We include a compact implementation of our new skinning decomposition method in PyTorch, which is easy to experiment with and modify to related problems.

Compressed Skinning for Facial Blendshapes

TL;DR

This work tackles real-time on-device facial animation with large numbers of blendshapes by converting delta-blendshape data into a compressed linear blend skinning (LBS) form. It introduces a first-order optimization approach that combines the Adam optimizer with projection steps to enforce non-negativity, unity, and sparsity in both the weight matrix and transformation components, producing a sparse skinning decomposition called compressed skinning. Key contributions include an explicit transformation from blendshape weights to LBS transforms, a sparsity-enhanced optimization framework, and a PyTorch implementation enabling flexible loss functions and constraints; an HD setting demonstrates substantially improved detail. Results show comparable or better fitting accuracy than Dem Bones while offering substantial memory savings (5–7×) and run-time speedups (2–3×) on mobile hardware, with additional gains in detail when using higher-norm formulations. The approach enables scalable, on-device facial animation for rigs with hundreds of blendshapes, albeit with longer pre-processing and a black-box rig evaluation assumption that may be refined in future work by integrating learned rigs or weighted vertex importance.

Abstract

We present a new method to bake classical facial animation blendshapes into a fast linear blend skinning representation. Previous work explored skinning decomposition methods that approximate general animated meshes using a dense set of bone transformations; these optimizers typically alternate between optimizing for the bone transformations and the skinning weights.We depart from this alternating scheme and propose a new approach based on proximal algorithms, which effectively means adding a projection step to the popular Adam optimizer. This approach is very flexible and allows us to quickly experiment with various additional constraints and/or loss functions. Specifically, we depart from the classical skinning paradigms and restrict the transformation coefficients to contain only about 10% non-zeros, while achieving similar accuracy and visual quality as the state-of-the-art. The sparse storage enables our method to deliver significant savings in terms of both memory and run-time speed. We include a compact implementation of our new skinning decomposition method in PyTorch, which is easy to experiment with and modify to related problems.
Paper Structure (10 sections, 9 equations, 6 figures, 6 tables)

This paper contains 10 sections, 9 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The skinning decomposition is pre-computed offline (left). On the end-user device, we first load the pre-computed $w_{i,j}$ and $\mathbf{N}_{k,j}$. Then, for each animation frame (runtime, right), we obtain $\mathbf{c}_k$ from the rig and compute $\mathbf{M}_j$. The skinning transformations $\mathbf{M}_j$ along with the rest-pose $\mathbf{v}_{0, i}$ and weights $w_{i,j}$ are passed to linear blend skinning module running on the GPU.
  • Figure 2: Our method leads to results of acceptable visual quality on various rigs and facial expressions, with errors comparable to Dem Bones (red color corresponds to error of 5mm or more). However, our method enables more efficient run-time.
  • Figure 3: Histograms of the errors of our method (dark blue) and Dem Bones (light blue) in centimeters. Our method achieves lower errors despite sparse skining transformations.
  • Figure 4: Decreasing the number of bones in Dem Bones from 40 to 20 increases the error significantly.
  • Figure 5: "Proteus HD" experiment: our method with 57400 transforms and $L^{12}$ norm captures finer detail than Dem Bones with the same number of transformations.
  • ...and 1 more figures