RigAnyFace: Scaling Neural Facial Mesh Auto-Rigging with Unlabeled Data
Wenchao Ma, Dario Kneubuehler, Maurice Chu, Ian Sachs, Haomiao Jiang, Sharon Xiaolei Huang
TL;DR
RigAnyFace tackles the challenge of auto-rigging facial meshes with diverse topologies, including multiple disconnected components, by deforming a neutral mesh into FACS poses using a triangulation-agnostic DiffusionNet backbone conditioned on FACS. A global encoder and a carefully designed 2D supervision pipeline—leveraging 2D appearance and motion signals from differentiable rendering and a MegActor-based animation model—enables scalable training on unlabeled data alongside a smaller set of artist-rigged ground truth. The method achieves state-of-the-art accuracy and generalization, demonstrably handling in-the-wild heads and complex components such as eyeballs, teeth, and gums, while enabling downstream applications like user-controlled animation, video-to-mesh retargeting, and text-to-3D rigging. This work lowers the barrier to high-quality facial rigs and broadens expressive avatar creation, albeit with limitations on shell-like geometries and extreme discretization artifacts that warrant future study.
Abstract
In this paper, we present RigAnyFace (RAF), a scalable neural auto-rigging framework for facial meshes of diverse topologies, including those with multiple disconnected components. RAF deforms a static neutral facial mesh into industry-standard FACS poses to form an expressive blendshape rig. Deformations are predicted by a triangulation-agnostic surface learning network augmented with our tailored architecture design to condition on FACS parameters and efficiently process disconnected components. For training, we curated a dataset of facial meshes, with a subset meticulously rigged by professional artists to serve as accurate 3D ground truth for deformation supervision. Due to the high cost of manual rigging, this subset is limited in size, constraining the generalization ability of models trained exclusively on it. To address this, we design a 2D supervision strategy for unlabeled neutral meshes without rigs. This strategy increases data diversity and allows for scaled training, thereby enhancing the generalization ability of models trained on this augmented data. Extensive experiments demonstrate that RAF is able to rig meshes of diverse topologies on not only our artist-crafted assets but also in-the-wild samples, outperforming previous works in accuracy and generalizability. Moreover, our method advances beyond prior work by supporting multiple disconnected components, such as eyeballs, for more detailed expression animation. Project page: https://wenchao-m.github.io/RigAnyFace.github.io
