Table of Contents
Fetching ...

MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning

Shengbo Gu, Yu-Kun Qiu, Yu-Ming Tang, Ancong Wu, Wei-Shi Zheng

TL;DR

MaintaAvatar tackles continual updating of neural-radiance-field avatars to support new appearances and poses without forgetting older ones. It introduces a Global-Local Joint Storage Module to separate global and local appearance variations and a Pose Distillation Module to preserve past pose information, all within a deformable NeRF framework guided by SMPL warping. A two-phase optimization with generative replay and selective freezing enables rapid fine-tuning from limited data while maintaining prior renderings, nearing joint-training performance and outperforming prior continual-learning baselines. The work enables practical, maintainable virtual avatars suitable for real-world applications with minimal data collection and fast adaptation.

Abstract

The generation of a virtual digital avatar is a crucial research topic in the field of computer vision. Many existing works utilize Neural Radiance Fields (NeRF) to address this issue and have achieved impressive results. However, previous works assume the images of the training person are available and fixed while the appearances and poses of a subject could constantly change and increase in real-world scenarios. How to update the human avatar but also maintain the ability to render the old appearance of the person is a practical challenge. One trivial solution is to combine the existing virtual avatar models based on NeRF with continual learning methods. However, there are some critical issues in this approach: learning new appearances and poses can cause the model to forget past information, which in turn leads to a degradation in the rendering quality of past appearances, especially color bleeding issues, and incorrect human body poses. In this work, we propose a maintainable avatar (MaintaAvatar) based on neural radiance fields by continual learning, which resolves the issues by utilizing a Global-Local Joint Storage Module and a Pose Distillation Module. Overall, our model requires only limited data collection to quickly fine-tune the model while avoiding catastrophic forgetting, thus achieving a maintainable virtual avatar. The experimental results validate the effectiveness of our MaintaAvatar model.

MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning

TL;DR

MaintaAvatar tackles continual updating of neural-radiance-field avatars to support new appearances and poses without forgetting older ones. It introduces a Global-Local Joint Storage Module to separate global and local appearance variations and a Pose Distillation Module to preserve past pose information, all within a deformable NeRF framework guided by SMPL warping. A two-phase optimization with generative replay and selective freezing enables rapid fine-tuning from limited data while maintaining prior renderings, nearing joint-training performance and outperforming prior continual-learning baselines. The work enables practical, maintainable virtual avatars suitable for real-world applications with minimal data collection and fast adaptation.

Abstract

The generation of a virtual digital avatar is a crucial research topic in the field of computer vision. Many existing works utilize Neural Radiance Fields (NeRF) to address this issue and have achieved impressive results. However, previous works assume the images of the training person are available and fixed while the appearances and poses of a subject could constantly change and increase in real-world scenarios. How to update the human avatar but also maintain the ability to render the old appearance of the person is a practical challenge. One trivial solution is to combine the existing virtual avatar models based on NeRF with continual learning methods. However, there are some critical issues in this approach: learning new appearances and poses can cause the model to forget past information, which in turn leads to a degradation in the rendering quality of past appearances, especially color bleeding issues, and incorrect human body poses. In this work, we propose a maintainable avatar (MaintaAvatar) based on neural radiance fields by continual learning, which resolves the issues by utilizing a Global-Local Joint Storage Module and a Pose Distillation Module. Overall, our model requires only limited data collection to quickly fine-tune the model while avoiding catastrophic forgetting, thus achieving a maintainable virtual avatar. The experimental results validate the effectiveness of our MaintaAvatar model.

Paper Structure

This paper contains 14 sections, 12 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: In reality, a person's pose and appearance constantly update. Our maintainable avatar is designed to continuously learn from sequential data, enabling it to render any previously encountered viewpoint, pose, or appearance.
  • Figure 2: $\textbf{MaintaAvatar Pipeline.}$ In this paper, we propose a continual learning strategy pipeline primarily based on the replay method. During the training of Task T, we replicate and freeze the network ${\Theta}_{T-1}$ from the past Task T-1. Given the camera parameters from Task T-1, the network $\Theta_{ T-1}$ can generate corresponding patches and the residual human body pose of one randomly selected past appearance, which are utilized to supervise the training of Task T. Simultaneously, $\Theta_T$ is trained using images from the new appearance. In addition to image supervision, we incorporate a Pose Distillation Module to enhance the memory of past pose information, thereby improving rendering quality. Ultimately, our model is capable of continuously learning the novel appearance without forgetting past appearances.
  • Figure 3: $\textbf{Pipeline for MaintaAvatar Network}$$\Theta$$\textbf{Structure.}$ For any given human body pose, we utilize skeletal motion based on the SMPL model to transform the body from the observation space to the canonical space. Meanwhile, we employ a network $MLP_p$ to predict the residual $\Delta_\Omega(\mathbf{p})$ between the current pose parameters and the true pose parameters. Subsequently, our Global-Local Joint Storage Module generates Tri-plane-based local embedding and global embedding for each appearance. These embedding, along with coordinate embedding, are fed into $MLP_o$ to predict color and opacity.
  • Figure 4: The visualization comparison results for the free-viewpoint rendering of our method and the $\mathrm {PersonNeRF}_{CL}$ on ZJU-MoCap neuralbody for the past tasks. Our method demonstrates superior rendering quality, especially in terms of color and human pose.
  • Figure 5: The Global-Local Joint Storage Module can help the model learn new appearances more quickly.
  • ...and 3 more figures