FlexAvatar: Learning Complete 3D Head Avatars with Partial Supervision
Tobias Kirschstein, Simon Giebenhain, Matthias Nießner
TL;DR
<3-5 sentence high-level summary>FlexAvatar tackles the problem of creating complete, animatable 3D head avatars from a single image by addressing the entanglement between driving signal and target viewpoint in monocular training. It introduces learnable bias sinks that separate monocular and multi-view data influence, enabling unified training while yielding complete 3D reconstructions at inference time. The architecture combines a transformer-based encoder $E$, a decoder $D$ that outputs articulated 3D Gaussians, and a StyleGAN-PixelShuffle upsampler, trained on diverse datasets to produce a smooth latent avatar space that supports identity interpolation and fast fitting. Across 3D portrait animation, single-image, few-shot, and monocular avatar creation tasks, FlexAvatar demonstrates strong generalization and render quality, with fast adaptation and minimal data requirements for high-fidelity avatars.
Abstract
We introduce FlexAvatar, a method for creating high-quality and complete 3D head avatars from a single image. A core challenge lies in the limited availability of multi-view data and the tendency of monocular training to yield incomplete 3D head reconstructions. We identify the root cause of this issue as the entanglement between driving signal and target viewpoint when learning from monocular videos. To address this, we propose a transformer-based 3D portrait animation model with learnable data source tokens, so-called bias sinks, which enables unified training across monocular and multi-view datasets. This design leverages the strengths of both data sources during inference: strong generalization from monocular data and full 3D completeness from multi-view supervision. Furthermore, our training procedure yields a smooth latent avatar space that facilitates identity interpolation and flexible fitting to an arbitrary number of input observations. In extensive evaluations on single-view, few-shot, and monocular avatar creation tasks, we verify the efficacy of FlexAvatar. Many existing methods struggle with view extrapolation while FlexAvatar generates complete 3D head avatars with realistic facial animations. Website: https://tobias-kirschstein.github.io/flexavatar/
