InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

Zhenzhi Wang; Jingbo Wang; Yixuan Li; Dahua Lin; Bo Dai

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai

TL;DR

This work introduces a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs, and demonstrates that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.

Abstract

Text-conditioned motion synthesis has made remarkable progress with the emergence of diffusion models. However, the majority of these motion diffusion models are primarily designed for a single character and overlook multi-human interactions. In our approach, we strive to explore this problem by synthesizing human motion with interactions for a group of characters of any size in a zero-shot manner. The key aspect of our approach is the adaptation of human-wise interactions as pairs of human joints that can be either in contact or separated by a desired distance. In contrast to existing methods that necessitate training motion generation models on multi-human motion datasets with a fixed number of characters, our approach inherently possesses the flexibility to model human interactions involving an arbitrary number of individuals, thereby transcending the limitations imposed by the training data. We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. It consists of a motion controller and an inverse kinematics guidance module that realistically and accurately aligns the joints of synthesized characters to the desired location. Furthermore, we demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model (LLM). Experimental results highlight the capability of our framework to generate interactions with multiple human characters and its potential to work with off-the-shelf physics-based character simulators. Code is available at https://github.com/zhenzhiwang/intercontrol

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

TL;DR

Abstract

Paper Structure (26 sections, 2 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 2 equations, 6 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Human Motion Generation
Human-related Interaction Generation.
Controllable Diffusion Models
InterControl
Formulation of Interaction Generation
Human Motion Diffusion Model (MDM)
Motion ControlNet for MDM
Inverse Kinematics (IK) Guidance
Interaction Generation
Experiments
Single-Person Controllable Motion Generation
Zero-Shot Multi-Person Interaction Generation
Ablation Studies
...and 11 more sections

Figures (6)

Figure 1: InterControl is able to generate interactions of a group of people given joint-joint contact or separation pairs as spatial condition, and it is only trained on single-person data. Our generated interactions are realistic and similar to real interactions in internet images in (a) daily life and (b) fighting. (c) shows our generated group motions (red dots) could serve as reference motions for physics animation.
Figure 2: Overview. Our model could precisely control human joints in the global space via the Motion ControlNet and IK guidance module. By leveraging LLM to adapt interaction descriptions to joint contact pairs, it could generate multi-person interactions via a single-person motion generation model in a zero-shot manner.
Figure 3: Comparison with PriorMDM shafir2023human in user-study of zero-shot human interaction generation.
Figure 4: Qualitative results of zero-shot human interaction generation.
Figure 5: Architecture of Motion ControlNet.
...and 1 more figures

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

TL;DR

Abstract

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

Authors

TL;DR

Abstract

Table of Contents

Figures (6)