Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation

Yue Ma; Zexuan Yan; Hongyu Liu; Hongfa Wang; Heng Pan; Yingqing He; Junkun Yuan; Ailing Zeng; Chengfei Cai; Heung-Yeung Shum; Zhifeng Li; Wei Liu; Linfeng Zhang; Qifeng Chen

Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation

Yue Ma, Zexuan Yan, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Zhifeng Li, Wei Liu, Linfeng Zhang, Qifeng Chen

TL;DR

Follow-Your-Emoji-Faster introduces a diffusion-based framework for expressive freestyle portrait animation driven by expression-aware landmarks. It combines a facial fine-grained loss, a progressive long-term generation strategy, and a Taylor-interpolated caching method to deliver high-quality, stable animations with substantial speedups. The approach generalizes across diverse portrait styles and driving motions, and is validated on the EmojiBench++ benchmark with strong qualitative and quantitative results. A new dataset and benchmark are released to support future research in this domain.

Abstract

We present Follow-Your-Emoji-Faster, an efficient diffusion-based framework for freestyle portrait animation driven by facial landmarks. The main challenges in this task are preserving the identity of the reference portrait, accurately transferring target expressions, and maintaining long-term temporal consistency while ensuring generation efficiency. To address identity preservation and accurate expression retargeting, we enhance Stable Diffusion with two key components: a expression-aware landmarks as explicit motion signals, which improve motion alignment, support exaggerated expressions, and reduce identity leakage; and a fine-grained facial loss that leverages both expression and facial masks to better capture subtle expressions and faithfully preserve the reference appearance. With these components, our model supports controllable and expressive animation across diverse portrait types, including real faces, cartoons, sculptures, and animals. However, diffusion-based frameworks typically struggle to efficiently generate long-term stable animation results, which remains a core challenge in this task. To address this, we propose a progressive generation strategy for stable long-term animation, and introduce a Taylor-interpolated cache, achieving a 2.6X lossless acceleration. These two strategies ensure that our method produces high-quality results efficiently, making it user-friendly and accessible. Finally, we introduce EmojiBench++, a more comprehensive benchmark comprising diverse portraits, driving videos, and landmark sequences. Extensive evaluations on EmojiBench++ demonstrate that Follow-Your-Emoji-Faster achieves superior performance in both animation quality and controllability. The code, training dataset and benchmark will be found in https://follow-your-emoji.github.io/.

Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation

TL;DR

Abstract

Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)