Table of Contents
Fetching ...

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai

TL;DR

This work introduces a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs, and demonstrates that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.

Abstract

Text-conditioned motion synthesis has made remarkable progress with the emergence of diffusion models. However, the majority of these motion diffusion models are primarily designed for a single character and overlook multi-human interactions. In our approach, we strive to explore this problem by synthesizing human motion with interactions for a group of characters of any size in a zero-shot manner. The key aspect of our approach is the adaptation of human-wise interactions as pairs of human joints that can be either in contact or separated by a desired distance. In contrast to existing methods that necessitate training motion generation models on multi-human motion datasets with a fixed number of characters, our approach inherently possesses the flexibility to model human interactions involving an arbitrary number of individuals, thereby transcending the limitations imposed by the training data. We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. It consists of a motion controller and an inverse kinematics guidance module that realistically and accurately aligns the joints of synthesized characters to the desired location. Furthermore, we demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model (LLM). Experimental results highlight the capability of our framework to generate interactions with multiple human characters and its potential to work with off-the-shelf physics-based character simulators. Code is available at https://github.com/zhenzhiwang/intercontrol

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

TL;DR

This work introduces a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs, and demonstrates that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.

Abstract

Text-conditioned motion synthesis has made remarkable progress with the emergence of diffusion models. However, the majority of these motion diffusion models are primarily designed for a single character and overlook multi-human interactions. In our approach, we strive to explore this problem by synthesizing human motion with interactions for a group of characters of any size in a zero-shot manner. The key aspect of our approach is the adaptation of human-wise interactions as pairs of human joints that can be either in contact or separated by a desired distance. In contrast to existing methods that necessitate training motion generation models on multi-human motion datasets with a fixed number of characters, our approach inherently possesses the flexibility to model human interactions involving an arbitrary number of individuals, thereby transcending the limitations imposed by the training data. We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. It consists of a motion controller and an inverse kinematics guidance module that realistically and accurately aligns the joints of synthesized characters to the desired location. Furthermore, we demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model (LLM). Experimental results highlight the capability of our framework to generate interactions with multiple human characters and its potential to work with off-the-shelf physics-based character simulators. Code is available at https://github.com/zhenzhiwang/intercontrol
Paper Structure (26 sections, 2 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 2 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: InterControl is able to generate interactions of a group of people given joint-joint contact or separation pairs as spatial condition, and it is only trained on single-person data. Our generated interactions are realistic and similar to real interactions in internet images in (a) daily life and (b) fighting. (c) shows our generated group motions (red dots) could serve as reference motions for physics animation.
  • Figure 2: Overview. Our model could precisely control human joints in the global space via the Motion ControlNet and IK guidance module. By leveraging LLM to adapt interaction descriptions to joint contact pairs, it could generate multi-person interactions via a single-person motion generation model in a zero-shot manner.
  • Figure 3: Comparison with PriorMDM shafir2023human in user-study of zero-shot human interaction generation.
  • Figure 4: Qualitative results of zero-shot human interaction generation.
  • Figure 5: Architecture of Motion ControlNet.
  • ...and 1 more figures