Table of Contents
Fetching ...

A Survey on Human Interaction Motion Generation

Kewei Sui, Anindita Ghosh, Inwoo Hwang, Bing Zhou, Jian Wang, Chuan Guo

TL;DR

This survey addresses the problem of generating realistic human interaction motions across four interaction settings: human-human, human-object, human-scene, and human-mix. It surveys foundational concepts, conditioning modalities, and a spectrum of generation methods from motion graphs and regression to diffusion models, transformers, RL with physics, and LLM-assisted planning. It catalogs datasets and evaluation metrics, emphasizing fidelity, naturalness, diversity, and condition coherence, and discusses data, physics, representation, and controllability as core challenges. The work underscores the importance of physics-informed, multi-modal, and context-aware modeling to advance practical applications in robotics, VR, and animation, and outlines four promising directions for future research.

Abstract

Humans inhabit a world defined by interactions -- with other humans, objects, and environments. These interactive movements not only convey our relationships with our surroundings but also demonstrate how we perceive and communicate with the real world. Therefore, replicating these interaction behaviors in digital systems has emerged as an important topic for applications in robotics, virtual reality, and animation. While recent advances in deep generative models and new datasets have accelerated progress in this field, significant challenges remain in modeling the intricate human dynamics and their interactions with entities in the external world. In this survey, we present, for the first time, a comprehensive overview of the literature in human interaction motion generation. We begin by establishing foundational concepts essential for understanding the research background. We then systematically review existing solutions and datasets across three primary interaction tasks -- human-human, human-object, and human-scene interactions -- followed by evaluation metrics. Finally, we discuss open research directions and future opportunities.

A Survey on Human Interaction Motion Generation

TL;DR

This survey addresses the problem of generating realistic human interaction motions across four interaction settings: human-human, human-object, human-scene, and human-mix. It surveys foundational concepts, conditioning modalities, and a spectrum of generation methods from motion graphs and regression to diffusion models, transformers, RL with physics, and LLM-assisted planning. It catalogs datasets and evaluation metrics, emphasizing fidelity, naturalness, diversity, and condition coherence, and discusses data, physics, representation, and controllability as core challenges. The work underscores the importance of physics-informed, multi-modal, and context-aware modeling to advance practical applications in robotics, VR, and animation, and outlines four promising directions for future research.

Abstract

Humans inhabit a world defined by interactions -- with other humans, objects, and environments. These interactive movements not only convey our relationships with our surroundings but also demonstrate how we perceive and communicate with the real world. Therefore, replicating these interaction behaviors in digital systems has emerged as an important topic for applications in robotics, virtual reality, and animation. While recent advances in deep generative models and new datasets have accelerated progress in this field, significant challenges remain in modeling the intricate human dynamics and their interactions with entities in the external world. In this survey, we present, for the first time, a comprehensive overview of the literature in human interaction motion generation. We begin by establishing foundational concepts essential for understanding the research background. We then systematically review existing solutions and datasets across three primary interaction tasks -- human-human, human-object, and human-scene interactions -- followed by evaluation metrics. Finally, we discuss open research directions and future opportunities.

Paper Structure

This paper contains 74 sections, 11 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Statistics on the number of works and datasets on human interaction motion generation over the past two decades, categorized into four interaction scenarios.
  • Figure 2: Illustration of three major human interaction motion generation tasks: (a) Human-human Interaction; (b) Human-object Interaction; and (c) Human-scene Interaction. Figures are adapted from xu2023interxxu2023interdiffTRUMANS.
  • Figure 3: Illustration of three major challenges in human-human interaction generation: (a) Semantic consistency; (b) Global interpersonal coordination; and (c) Fine-grained local interaction. All figures are adapted from xu2023interx.
  • Figure 4: Examples of distance-aware interaction mechanisms. (Left) ReMoS’s ghosh2024remos Distance-Aware Reaction Loss applies an exponentially decaying function to prioritize reactor joints closer to the actor. (Right) InterGen InterGen introduces Joint Distance Loss, which activates only when the horizontal distance between two individuals falls within a specified range, represented by the cylindrical region in the figure. Figures are adapted from ghosh2024remosInterGen.
  • Figure 5: InterDiff xu2023interdiff applies coordinate transformations to represent object states relative to contact points, resulting in simpler motion patterns (right column) compared to using absolute positions (middle column). Figures are adapted from xu2023interdiff.
  • ...and 2 more figures