MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow

Zhe Li; Yisheng He; Lei Zhong; Weichao Shen; Qi Zuo; Lingteng Qiu; Zilong Dong; Laurence Tianruo Yang; Weihao Yuan

MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow

Zhe Li, Yisheng He, Lei Zhong, Weichao Shen, Qi Zuo, Lingteng Qiu, Zilong Dong, Laurence Tianruo Yang, Weihao Yuan

TL;DR

MulSMo tackles the challenge of stylized motion generation by introducing a bidirectional control flow between the style and content networks and enabling multimodal style signals through contrastive learning. It augments diffusion-based generation with a Motion-aligned Temporal VAE (MaTLD) to better preserve temporal dynamics in the motion latent space. The approach achieves superior results across multiple datasets, outperforming prior stylized motion methods and enabling style control from motions, text, or images. This framework offers a flexible, scalable solution for multimodal, content-aware motion stylization with broad applicability in animation, AR/VR, and robotics.

Abstract

Generating motion sequences conforming to a target style while adhering to the given content prompts requires accommodating both the content and style. In existing methods, the information usually only flows from style to content, which may cause conflict between the style and content, harming the integration. Differently, in this work we build a bidirectional control flow between the style and the content, also adjusting the style towards the content, in which case the style-content collision is alleviated and the dynamics of the style is better preserved in the integration. Moreover, we extend the stylized motion generation from one modality, i.e. the style motion, to multiple modalities including texts and images through contrastive learning, leading to flexible style control on the motion generation. Extensive experiments demonstrate that our method significantly outperforms previous methods across different datasets, while also enabling multimodal signals control. The code of our method will be made publicly available.

MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow

TL;DR

Abstract

MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)