Table of Contents
Fetching ...

3D Human Interaction Generation: A Survey

Siyuan Fan, Wenke Huang, Xiantao Cai, Bo Du

TL;DR

3D Human Interaction Generation: A Survey organizes the rapidly evolving literature into three interaction paradigms—HSI, HOI, and HHI—grounding the discussion in foundational technologies (3D representations, motion capture, generative models). It provides a comprehensive taxonomy of methods, datasets, and evaluation metrics, highlighting diffusion-based and text/scene-conditioned approaches as current frontrunners. The survey identifies critical challenges, including data scarcity, physical plausibility, and controllability, and outlines future directions such as large-scale real-world data collection, multi-person and non-rigid object interactions, and language-guided generation. Overall, the work serves as a central reference for researchers and practitioners aiming to advance realistic, context-aware 3D interaction synthesis across diverse application domains.

Abstract

3D human interaction generation has emerged as a key research area, focusing on producing dynamic and contextually relevant interactions between humans and various interactive entities. Recent rapid advancements in 3D model representation methods, motion capture technologies, and generative models have laid a solid foundation for the growing interest in this domain. Existing research in this field can be broadly categorized into three areas: human-scene interaction, human-object interaction, and human-human interaction. Despite the rapid advancements in this area, challenges remain due to the need for naturalness in human motion generation and the accurate interaction between humans and interactive entities. In this survey, we present a comprehensive literature review of human interaction generation, which, to the best of our knowledge, is the first of its kind. We begin by introducing the foundational technologies, including model representations, motion capture methods, and generative models. Subsequently, we introduce the approaches proposed for the three sub-tasks, along with their corresponding datasets and evaluation metrics. Finally, we discuss potential future research directions in this area and conclude the survey. Through this survey, we aim to offer a comprehensive overview of the current advancements in the field, highlight key challenges, and inspire future research works.

3D Human Interaction Generation: A Survey

TL;DR

3D Human Interaction Generation: A Survey organizes the rapidly evolving literature into three interaction paradigms—HSI, HOI, and HHI—grounding the discussion in foundational technologies (3D representations, motion capture, generative models). It provides a comprehensive taxonomy of methods, datasets, and evaluation metrics, highlighting diffusion-based and text/scene-conditioned approaches as current frontrunners. The survey identifies critical challenges, including data scarcity, physical plausibility, and controllability, and outlines future directions such as large-scale real-world data collection, multi-person and non-rigid object interactions, and language-guided generation. Overall, the work serves as a central reference for researchers and practitioners aiming to advance realistic, context-aware 3D interaction synthesis across diverse application domains.

Abstract

3D human interaction generation has emerged as a key research area, focusing on producing dynamic and contextually relevant interactions between humans and various interactive entities. Recent rapid advancements in 3D model representation methods, motion capture technologies, and generative models have laid a solid foundation for the growing interest in this domain. Existing research in this field can be broadly categorized into three areas: human-scene interaction, human-object interaction, and human-human interaction. Despite the rapid advancements in this area, challenges remain due to the need for naturalness in human motion generation and the accurate interaction between humans and interactive entities. In this survey, we present a comprehensive literature review of human interaction generation, which, to the best of our knowledge, is the first of its kind. We begin by introducing the foundational technologies, including model representations, motion capture methods, and generative models. Subsequently, we introduce the approaches proposed for the three sub-tasks, along with their corresponding datasets and evaluation metrics. Finally, we discuss potential future research directions in this area and conclude the survey. Through this survey, we aim to offer a comprehensive overview of the current advancements in the field, highlight key challenges, and inspire future research works.

Paper Structure

This paper contains 27 sections, 5 equations, 6 figures.

Figures (6)

  • Figure 1: This survey begins by defining the relevant techniques for human interaction generation. The core of the survey is organized into three main categories: Human-Scene Interaction (HSI), Human-Object Interaction (HOI), and Human-Human Interaction (HHI). Additionally, this survey summarizes the available datasets for each task and the evaluation metrics.
  • Figure 2: An overview of typical human interaction generation approaches. Example figures from huang2023diffusionli2023objectxu2024inter.
  • Figure 3: Recent advances of human interaction generation methods with different interactive entities.
  • Figure 4: Interactive entities representations of (a) mesh xu2021d3d, (b) point cloud wang2022humanise, (c) voxel savva2016pigraphs, (d) keypoints wan2022learn, (e) computer-aided design xu2021d3d, (f) RGB images cao2020long, (g) RGB-D images cao2020long, and (h) signed distance function li2023object.
  • Figure 5: Motion capture techniques. (a) Examples of IMUs-based mocap system with inertial sensors attached to the entities' surface zhao2024m. (b) Examples of markerless mocap system zhang2024hoi, which uses cameras and algorithms to track human motion. (c) Examples of marker-based mocap system xu2024inter, where reflective markers are attached to the subject. (d) Examples of hybrid mocap system jiang2022chairs, combining both optical and inertial setups.
  • ...and 1 more figures