Table of Contents
Fetching ...

FaceShot: Bring Any Character into Life

Junyao Gao, Yanan Sun, Fei Shen, Xin Jiang, Zhening Xing, Kai Chen, Cairong Zhao

TL;DR

FaceShot tackles the challenge of animating any character from any driven video without training or fine-tuning, including non-human characters. It introduces three components—appearance-guided landmark matching, coordinate-based landmark retargeting, and a landmark-driven animation model—and demonstrates that diffusion-feature semantics enable precise landmark estimation across domains. The framework is designed as a plugin compatible with any landmark-driven animation model and is validated on CharacBench, where it outperforms state-of-the-art methods in identity preservation, visual quality, and motion fidelity. The work enables open-domain portrait animation with practical implications for entertainment and education, while acknowledging ethical considerations and providing a public release plan.

Abstract

In this paper, we present FaceShot, a novel training-free portrait animation framework designed to bring any character into life from any driven video without fine-tuning or retraining. We achieve this by offering precise and robust reposed landmark sequences from an appearance-guided landmark matching module and a coordinate-based landmark retargeting module. Together, these components harness the robust semantic correspondences of latent diffusion models to produce facial motion sequence across a wide range of character types. After that, we input the landmark sequences into a pre-trained landmark-driven animation model to generate animated video. With this powerful generalization capability, FaceShot can significantly extend the application of portrait animation by breaking the limitation of realistic portrait landmark detection for any stylized character and driven video. Also, FaceShot is compatible with any landmark-driven animation model, significantly improving overall performance. Extensive experiments on our newly constructed character benchmark CharacBench confirm that FaceShot consistently surpasses state-of-the-art (SOTA) approaches across any character domain. More results are available at our project website https://faceshot2024.github.io/faceshot/.

FaceShot: Bring Any Character into Life

TL;DR

FaceShot tackles the challenge of animating any character from any driven video without training or fine-tuning, including non-human characters. It introduces three components—appearance-guided landmark matching, coordinate-based landmark retargeting, and a landmark-driven animation model—and demonstrates that diffusion-feature semantics enable precise landmark estimation across domains. The framework is designed as a plugin compatible with any landmark-driven animation model and is validated on CharacBench, where it outperforms state-of-the-art methods in identity preservation, visual quality, and motion fidelity. The work enables open-domain portrait animation with practical implications for entertainment and education, while acknowledging ethical considerations and providing a public release plan.

Abstract

In this paper, we present FaceShot, a novel training-free portrait animation framework designed to bring any character into life from any driven video without fine-tuning or retraining. We achieve this by offering precise and robust reposed landmark sequences from an appearance-guided landmark matching module and a coordinate-based landmark retargeting module. Together, these components harness the robust semantic correspondences of latent diffusion models to produce facial motion sequence across a wide range of character types. After that, we input the landmark sequences into a pre-trained landmark-driven animation model to generate animated video. With this powerful generalization capability, FaceShot can significantly extend the application of portrait animation by breaking the limitation of realistic portrait landmark detection for any stylized character and driven video. Also, FaceShot is compatible with any landmark-driven animation model, significantly improving overall performance. Extensive experiments on our newly constructed character benchmark CharacBench confirm that FaceShot consistently surpasses state-of-the-art (SOTA) approaches across any character domain. More results are available at our project website https://faceshot2024.github.io/faceshot/.

Paper Structure

This paper contains 13 sections, 9 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Visualization results of our FaceShot. Given any character and any driven video, FaceShot effectively captures subtle facial expressions and generates stable animations for each character. Especially for non-human characters, such as emojis and toys, FaceShot demonstrates remarkable animation capabilities.
  • Figure 2: Visual results generated from current portrait animation methods and our FaceShot. Previous methods apparently retain the target human's appearance. In contrast, the result of FaceShot both aligns the dog's facial features and captures the target human's expression.
  • Figure 3: The FaceShot framework first generates precise facial landmarks for the reference character with appearance guidance. Next, a coordinate-based landmark retargeting module generates the landmark sequence based on driving video. Finally, this sequence is fed into an animation model to animate character.
  • Figure 4: Visualizations of point matching with (w/, highlighted in a red box) or without (w/o) appearance guidance using an anime diffusion model.
  • Figure 5: Illustration of our appearance gallery. We output the closest domains for each reference image to reduce the appearance discrepancy.
  • ...and 13 more figures