Table of Contents
Fetching ...

AniArtAvatar: Animatable 3D Art Avatar from a Single Image

Shaoxu Li

TL;DR

A novel approach for generating animatable 3D-aware art avatars from a single image, with controllable facial expressions, head poses, and shoulder movements, using a view-conditioned 2D diffusion model to synthesize multi-view images from a single art portrait with a neutral expression.

Abstract

We present a novel approach for generating animatable 3D-aware art avatars from a single image, with controllable facial expressions, head poses, and shoulder movements. Unlike previous reenactment methods, our approach utilizes a view-conditioned 2D diffusion model to synthesize multi-view images from a single art portrait with a neutral expression. With the generated colors and normals, we synthesize a static avatar using an SDF-based neural surface. For avatar animation, we extract control points, transfer the motion with these points, and deform the implicit canonical space. Firstly, we render the front image of the avatar, extract the 2D landmarks, and project them to the 3D space using a trained SDF network. We extract 3D driving landmarks using 3DMM and transfer the motion to the avatar landmarks. To animate the avatar pose, we manually set the body height and bound the head and torso of an avatar with two cages. The head and torso can be animated by transforming the two cages. Our approach is a one-shot pipeline that can be applied to various styles. Experiments demonstrate that our method can generate high-quality 3D art avatars with desired control over different motions.

AniArtAvatar: Animatable 3D Art Avatar from a Single Image

TL;DR

A novel approach for generating animatable 3D-aware art avatars from a single image, with controllable facial expressions, head poses, and shoulder movements, using a view-conditioned 2D diffusion model to synthesize multi-view images from a single art portrait with a neutral expression.

Abstract

We present a novel approach for generating animatable 3D-aware art avatars from a single image, with controllable facial expressions, head poses, and shoulder movements. Unlike previous reenactment methods, our approach utilizes a view-conditioned 2D diffusion model to synthesize multi-view images from a single art portrait with a neutral expression. With the generated colors and normals, we synthesize a static avatar using an SDF-based neural surface. For avatar animation, we extract control points, transfer the motion with these points, and deform the implicit canonical space. Firstly, we render the front image of the avatar, extract the 2D landmarks, and project them to the 3D space using a trained SDF network. We extract 3D driving landmarks using 3DMM and transfer the motion to the avatar landmarks. To animate the avatar pose, we manually set the body height and bound the head and torso of an avatar with two cages. The head and torso can be animated by transforming the two cages. Our approach is a one-shot pipeline that can be applied to various styles. Experiments demonstrate that our method can generate high-quality 3D art avatars with desired control over different motions.
Paper Structure (29 sections, 6 equations, 8 figures, 1 table)

This paper contains 29 sections, 6 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: In this work, we present AniArtAvatar, a novel one-shot 3D avatar generation and animation pipeline. The avatar can be animated with controllable 3D camera viewpoints, facial expressions, head poses, and shoulder movements. The method applies to various art styles.
  • Figure 2: Method overview. (a) Static Avatar Reconstruction: Given an art portrait image, we use a view-conditioned 2D diffusion model, Wonder3Dlong2023wonder3d, to generate multi-view images and normals. The input of Wonder3D includes a single image and pre-set camera poses. We optimize a static avatar using an SDF-based neural surface with generated images and normals. (b) Avatar Animation: We render the avatar into front image and extract 2D landmarks. With camera rays from 2D landmarks, we calculate the 3D corresponding landmarks $\{L^{s,neut}\}$ leveraging the SDF network. We transfer $\{L^{s,neut}\}$ to deformed landmarks $\{L^{s,exp}\}$ with drive landmarks $\{L^{d,neut}\}$ and $\{L^{d,exp}\}$. $\{L^{d,exp}\}$ can be manually set or extracted from 3DMM human face model. For head and shoulder movements, we extracted head cage $C_{head}$ and torso cage $C_{torso}$ using 3D landmarks $\{L^{s,neut}\}$, 3D canonical mesh $M_c$, and manually set torso height $y_{torso}$. We deform the canonical space for the static avatar to animate the art avatar with Delaunay triangulation-based deformation, with $\{L^{s,neut}\}$, $\{L^{s,exp}\}$, $C_{head}$, $C_{torso}$ and pose code $z_{pose}$ as input.
  • Figure 3: Results with controllable camera viewpoint, facial expression and head pose.
  • Figure 4: Qualitative comparisons with state-of-the-art methods on cross-domain reenactment. Competing methods are fine-tuned with the cartoon dataset.
  • Figure 5: Ablation study on additional landmarks.
  • ...and 3 more figures