Table of Contents
Fetching ...

Snapmoji: Instant Generation of Animatable Dual-Stylized Avatars

Eric M. Chen, Di Liu, Sizhuo Ma, Michael Vasilkovsky, Bing Zhou, Qiang Gao, Wenzhou Wang, Jiahao Luo, Dimitris N. Metaxas, Vincent Sitzmann, Jian Wang

TL;DR

This work addresses the demand for expressive, animatable avatars by introducing Snapmoji, a two-stage system that first converts a selfie into a primary-styled 2D avatar via Gaussian Domain Adaptation and diffusion, then lifts this result into a 3D Gaussian avatar capable of dynamic animation. The core contribution is the Gaussian Domain Adaptation framework, which leverages 3D priors from Objaverse to produce high-fidelity primary-style avatars and preserve identity. A second contribution is a 3D animation pipeline that combines 3DMM and FACS blendshapes within a cross-attention generator to deliver real-time, expressive avatars on mobile devices, with a WebGL rendering backend. The system demonstrates superior 2D stylization quality, robust 3D geometry, and real-time performance (0.9 s per selfie, 30–40 FPS on mobile), offering a practical tool for instant, dual-stylized avatar creation in AR and social applications.

Abstract

The increasing popularity of personalized avatar systems, such as Snapchat Bitmojis and Apple Memojis, highlights the growing demand for digital self-representation. Despite their widespread use, existing avatar platforms face significant limitations, including restricted expressivity due to predefined assets, tedious customization processes, or inefficient rendering requirements. Addressing these shortcomings, we introduce Snapmoji, an avatar generation system that instantly creates animatable, dual-stylized avatars from a selfie. We propose Gaussian Domain Adaptation (GDA), which is pre-trained on large-scale Gaussian models using 3D data from sources such as Objaverse and fine-tuned with 2D style transfer tasks, endowing it with a rich 3D prior. This enables Snapmoji to transform a selfie into a primary stylized avatar, like the Bitmoji style, and apply a secondary style, such as Plastic Toy or Alien, all while preserving the user's identity and the primary style's integrity. Our system is capable of producing 3D Gaussian avatars that support dynamic animation, including accurate facial expression transfer. Designed for efficiency, Snapmoji achieves selfie-to-avatar conversion in just 0.9 seconds and supports real-time interactions on mobile devices at 30 to 40 frames per second. Extensive testing confirms that Snapmoji outperforms existing methods in versatility and speed, making it a convenient tool for automatic avatar creation in various styles.

Snapmoji: Instant Generation of Animatable Dual-Stylized Avatars

TL;DR

This work addresses the demand for expressive, animatable avatars by introducing Snapmoji, a two-stage system that first converts a selfie into a primary-styled 2D avatar via Gaussian Domain Adaptation and diffusion, then lifts this result into a 3D Gaussian avatar capable of dynamic animation. The core contribution is the Gaussian Domain Adaptation framework, which leverages 3D priors from Objaverse to produce high-fidelity primary-style avatars and preserve identity. A second contribution is a 3D animation pipeline that combines 3DMM and FACS blendshapes within a cross-attention generator to deliver real-time, expressive avatars on mobile devices, with a WebGL rendering backend. The system demonstrates superior 2D stylization quality, robust 3D geometry, and real-time performance (0.9 s per selfie, 30–40 FPS on mobile), offering a practical tool for instant, dual-stylized avatar creation in AR and social applications.

Abstract

The increasing popularity of personalized avatar systems, such as Snapchat Bitmojis and Apple Memojis, highlights the growing demand for digital self-representation. Despite their widespread use, existing avatar platforms face significant limitations, including restricted expressivity due to predefined assets, tedious customization processes, or inefficient rendering requirements. Addressing these shortcomings, we introduce Snapmoji, an avatar generation system that instantly creates animatable, dual-stylized avatars from a selfie. We propose Gaussian Domain Adaptation (GDA), which is pre-trained on large-scale Gaussian models using 3D data from sources such as Objaverse and fine-tuned with 2D style transfer tasks, endowing it with a rich 3D prior. This enables Snapmoji to transform a selfie into a primary stylized avatar, like the Bitmoji style, and apply a secondary style, such as Plastic Toy or Alien, all while preserving the user's identity and the primary style's integrity. Our system is capable of producing 3D Gaussian avatars that support dynamic animation, including accurate facial expression transfer. Designed for efficiency, Snapmoji achieves selfie-to-avatar conversion in just 0.9 seconds and supports real-time interactions on mobile devices at 30 to 40 frames per second. Extensive testing confirms that Snapmoji outperforms existing methods in versatility and speed, making it a convenient tool for automatic avatar creation in various styles.

Paper Structure

This paper contains 23 sections, 11 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: We introduce Snapmoji, a system that can instantly generate animatable dual-stylized avatars. Our dual stylization process reimagines avatars in various artistic styles, enabling users to visualize themselves in diverse scenarios and create personalized stories. Our approach also enables 3D stylized gaussian avatars generation and expression animation. Snapmoji accomplishes the selfie-to-avatar conversion in just 0.9 seconds, and offers real-time functionality for mobile applications. https://echen01.github.io/instamoji-supp/
  • Figure 2: The Snapmoji Inference Pipeline. The pipeline has two stages. First, the Gaussian Domain Adaptation network $\mathcal{E}_\text{GDA}$ converts a facial image into a primary-style avatar $I_\text{sty}$. This avatar undergoes further personalization using a text-guided diffusion process with $T$ steps for additional stylization. Second, expression codes extracted via an 3DMM and FACS are combined with identity features $f_\text{id}$ from a reference image $I_\text{ref}$ and positional maps $f_\text{pos}$ from a driving image $I_\text{drive}$. The unposed dual-stylized avatar $I_\text{unposed}$ is then processed by an asymmetric UNet $\mathcal{G}(\cdot)$, conditioned on the driving codes $f_\text{drive}$ through cross-attention, to generate animated, dual-stylized 3D avatars.
  • Figure 3: Gaussian Domain Adaptation. We show the outputs of the GDA network over several training epochs to visualize the domain shifts from natural images to cartoon avatars.
  • Figure 4: 2D Stylized Avatar Generation. This figure showcases the transformation of photos from eight individuals into the Bitmoji domain using various methods. GAN inversion produces overly generic avatars, struggling with unique features such as beards, glasses, and headwear. Diffusion-based models inaccurately add features, making them inconsistent for targeted styles. In contrast, our GDA method excels in creating high-quality avatars, effectively retaining the original identity features.
  • Figure 5: 2D Stylized Avatar to 3D Generation. We demonstrate the process of converting dual-stylized avatar images, derived from the single-stylized avatars in Fig. \ref{['fig:gda_results']}, into 3D avatars. PTI inversion with EG3D Chan2022Roich2021PivotalTF struggles to accurately reproduce 3D geometry, while LGM Tang2024LGMLM produces artifacts in both geometry and texture. Despite being trained exclusively on the Bitmoji style, our method successfully generates high-quality 3D avatars in previously unseen styles.
  • ...and 9 more figures