Table of Contents
Fetching ...

My3DGen: A Scalable Personalized 3D Generative Model

Luchao Qi, Jiaye Wu, Annie N. Wang, Shengze Wang, Roni Sengupta

TL;DR

My3DGen tackles scalable personalization of 3D face generation by decoupling global and personalized features and using LoRA-based low-rank adapters on a frozen EG3D. It achieves a per-identity trainable budget of $240K$ parameters, a $127×$ reduction from full fine-tuning, and can personalize with as few as 50 images to enable novel view synthesis, editing, and appearance synthesis while preserving identity. It demonstrates strong inversion, interpolation, and semantic editing capabilities, and even extends to cat faces, illustrating the approach's generality. This work offers a practical path to scalable, identity-preserving 3D avatars for broad user populations.

Abstract

In recent years, generative 3D face models (e.g., EG3D) have been developed to tackle the problem of synthesizing photo-realistic faces. However, these models are often unable to capture facial features unique to each individual, highlighting the importance of personalization. Some prior works have shown promise in personalizing generative face models, but these studies primarily focus on 2D settings. Also, these methods require both fine-tuning and storing a large number of parameters for each user, posing a hindrance to achieving scalable personalization. Another challenge of personalization is the limited number of training images available for each individual, which often leads to overfitting when using full fine-tuning methods. Our proposed approach, My3DGen, generates a personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity. We decouple the 3D facial features into global features and personalized features by freezing the pre-trained EG3D and training additional personalized weights through low-rank decomposition. As a result, My3DGen introduces only $\textbf{240K}$ personalized parameters per individual, leading to a $\textbf{127}\times$ reduction in trainable parameters compared to the $\textbf{30.6M}$ required for fine-tuning the entire parameter space. Despite this significant reduction in storage, our model preserves identity features without compromising the quality of downstream applications.

My3DGen: A Scalable Personalized 3D Generative Model

TL;DR

My3DGen tackles scalable personalization of 3D face generation by decoupling global and personalized features and using LoRA-based low-rank adapters on a frozen EG3D. It achieves a per-identity trainable budget of parameters, a reduction from full fine-tuning, and can personalize with as few as 50 images to enable novel view synthesis, editing, and appearance synthesis while preserving identity. It demonstrates strong inversion, interpolation, and semantic editing capabilities, and even extends to cat faces, illustrating the approach's generality. This work offers a practical path to scalable, identity-preserving 3D avatars for broad user populations.

Abstract

In recent years, generative 3D face models (e.g., EG3D) have been developed to tackle the problem of synthesizing photo-realistic faces. However, these models are often unable to capture facial features unique to each individual, highlighting the importance of personalization. Some prior works have shown promise in personalizing generative face models, but these studies primarily focus on 2D settings. Also, these methods require both fine-tuning and storing a large number of parameters for each user, posing a hindrance to achieving scalable personalization. Another challenge of personalization is the limited number of training images available for each individual, which often leads to overfitting when using full fine-tuning methods. Our proposed approach, My3DGen, generates a personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity. We decouple the 3D facial features into global features and personalized features by freezing the pre-trained EG3D and training additional personalized weights through low-rank decomposition. As a result, My3DGen introduces only personalized parameters per individual, leading to a reduction in trainable parameters compared to the required for fine-tuning the entire parameter space. Despite this significant reduction in storage, our model preserves identity features without compromising the quality of downstream applications.
Paper Structure (19 sections, 4 equations, 12 figures, 7 tables)

This paper contains 19 sections, 4 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Given 50 images of Michelle Obama, we personalize a pre-trained 3D generative prior and demonstrate the applications in various downstream tasks. Each downstream task presents the original input image of Michelle (top left), alongside the corresponding output generated using the pre-trained face prior (bottom left), compared to the output using our personalized face prior (right). Our personalized prior can faithfully retain the key facial characteristics of Michelle Obama, as opposed to the pre-trained prior.
  • Figure 2: Architecture of our personalization approach. We project an individual's images into StyleGAN2's $\mathcal{W}$ space through latent code optimization to obtain a set of latent anchors. We then tune the generator to reconstruct an individual's images. During tuning, the generator is frozen while only LoRA weights are personalized.
  • Figure 3: Qualitative evaluation for image inversion, $i.e.$ generating 3D-aware view synthesis from a single input image. Ours is My3DGen with LoRA rank $r = 1$ where the number of trainable parameters $= 0.2$M. Visual differences are highlighted with a red-box, zoom in to view finer details.
  • Figure 4: Quantitative (top) and qualitative (bottom) evaluation for interpolation in latent space between two anchor images, highlighted in color. We measure identity preservation using $\text{ID}_{sim}$, for which illustrative visual results at different interpolation steps $\theta$ are also provided. We compare the results before and after personalization.
  • Figure 5: Quantitative analysis of the rank of LoRA for inversion and interpolation tasks. '-' indicates full fine-tuning. We report the average $\text{ID}_{sim}$ across latent paths from the interpolation task.
  • ...and 7 more figures