MultiAnimate: Pose-Guided Image Animation Made Extensible

Yingcheng Hu; Haowen Gong; Chuanguang Yang; Zhulin An; Yongjun Xu; Songhua Liu

MultiAnimate: Pose-Guided Image Animation Made Extensible

Yingcheng Hu, Haowen Gong, Chuanguang Yang, Zhulin An, Yongjun Xu, Songhua Liu

TL;DR

This paper proposes an extensible multi-character image animation framework built upon modern Diffusion Transformers (DiTs) for video generation that achieves state-of-the-art performance in multi-character image animation, surpassing existing diffusion-based baselines.

Abstract

Pose-guided human image animation aims to synthesize realistic videos of a reference character driven by a sequence of poses. While diffusion-based methods have achieved remarkable success, most existing approaches are limited to single-character animation. We observe that naively extending these methods to multi-character scenarios often leads to identity confusion and implausible occlusions between characters. To address these challenges, in this paper, we propose an extensible multi-character image animation framework built upon modern Diffusion Transformers (DiTs) for video generation. At its core, our framework introduces two novel components-Identifier Assigner and Identifier Adapter - which collaboratively capture per-person positional cues and inter-person spatial relationships. This mask-driven scheme, along with a scalable training strategy, not only enhances flexibility but also enables generalization to scenarios with more characters than those seen during training. Remarkably, trained on only a two-character dataset, our model generalizes to multi-character animation while maintaining compatibility with single-character cases. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in multi-character image animation, surpassing existing diffusion-based baselines.

MultiAnimate: Pose-Guided Image Animation Made Extensible

TL;DR

Abstract

Paper Structure (14 sections, 10 figures, 2 tables)

This paper contains 14 sections, 10 figures, 2 tables.

Introduction
Related Works
Methods
MultiAnimate
A Multi-Character Motion Ambiguity
Proposed Pipeline
Training Strategy
Symmetry Issue and Generalization Ability
Proposed Strategy
Experiments
Comparison with Concurrent Work
Various Animation Scenarios
Ablation Studies
Conclusion

Figures (10)

Figure 1: Multi-character pose-guided image animation generated by our framework. Our method performs multi-character image animation with consistent identity and appearance for each character. Notably, our framework, trained only on two-character data, is capable of producing identity-consistent three-person videos and can, in principle, be extended to scenarios with even more participants (e.g., seven characters).
Figure 2: Dilemmas of current methods in multi-character image animation.
Figure 3: In multi-character image animation, identical pose sequences can lead to multiple plausible motion trajectories.
Figure 4: Overview of our framework. Our pipeline contains two main streams: the reference stream, which encodes the reference image and its pose to capture appearance information, and the motion stream, which encodes multi-character pose sequences and tracking masks to model motion and spatial conditions. The two streams are fused through element-wise addition of latent tokens. The Identifier Assigner unifies per-person tracking masks into a structured label representation, preserving spatial relationships and interactions among multiple characters. This representation is converted to the feature space of the DiT backbone by the Identifier Adapter.
Figure 5: Our framework performs well at early training stages, but inconsistencies emerge when the person-assigned labels at inference differ from those seen during training.
...and 5 more figures

MultiAnimate: Pose-Guided Image Animation Made Extensible

TL;DR

Abstract

MultiAnimate: Pose-Guided Image Animation Made Extensible

Authors

TL;DR

Abstract

Table of Contents

Figures (10)