Table of Contents
Fetching ...

Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes

Jianqi Chen, Panwen Hu, Xiaojun Chang, Zhenwei Shi, Michael Kampffmeyer, Xiaodan Liang

TL;DR

Sitcom-Crafter addresses the lack of a unified system for diverse human motion generation in 3D scenes by integrating locomotion, scene-interaction, and human-human interaction under long plot guidance. It introduces a self-supervised scene-aware human-human interaction module that injects synthetic scene information via implicit SDF conditioning, and unifies motion representation through marker points aided by a body regressor, all within a plot-driven, eight-module pipeline. Experimental results on open 3D scenes and established HH-I datasets show improved physics-constraint metrics and motion realism, validating the approach against strong baselines. This work promises to streamline creative workflows in animation and game design by enabling cohesive, plot-driven, multi-type motion generation in complex environments.

Abstract

Recent advancements in human motion synthesis have focused on specific types of motions, such as human-scene interaction, locomotion or human-human interaction, however, there is a lack of a unified system capable of generating a diverse combination of motion types. In response, we introduce Sitcom-Crafter, a comprehensive and extendable system for human motion generation in 3D space, which can be guided by extensive plot contexts to enhance workflow efficiency for anime and game designers. The system is comprised of eight modules, three of which are dedicated to motion generation, while the remaining five are augmentation modules that ensure consistent fusion of motion sequences and system functionality. Central to the generation modules is our novel 3D scene-aware human-human interaction module, which addresses collision issues by synthesizing implicit 3D Signed Distance Function (SDF) points around motion spaces, thereby minimizing human-scene collisions without additional data collection costs. Complementing this, our locomotion and human-scene interaction modules leverage existing methods to enrich the system's motion generation capabilities. Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types, hand pose retrieval to enhance motion realism, motion collision revision to prevent human collisions, and 3D retargeting to ensure visual fidelity. Experimental evaluations validate the system's ability to generate high-quality, diverse, and physically realistic motions, underscoring its potential for advancing creative workflows. Project page: https://windvchen.github.io/Sitcom-Crafter.

Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes

TL;DR

Sitcom-Crafter addresses the lack of a unified system for diverse human motion generation in 3D scenes by integrating locomotion, scene-interaction, and human-human interaction under long plot guidance. It introduces a self-supervised scene-aware human-human interaction module that injects synthetic scene information via implicit SDF conditioning, and unifies motion representation through marker points aided by a body regressor, all within a plot-driven, eight-module pipeline. Experimental results on open 3D scenes and established HH-I datasets show improved physics-constraint metrics and motion realism, validating the approach against strong baselines. This work promises to streamline creative workflows in animation and game design by enabling cohesive, plot-driven, multi-type motion generation in complex environments.

Abstract

Recent advancements in human motion synthesis have focused on specific types of motions, such as human-scene interaction, locomotion or human-human interaction, however, there is a lack of a unified system capable of generating a diverse combination of motion types. In response, we introduce Sitcom-Crafter, a comprehensive and extendable system for human motion generation in 3D space, which can be guided by extensive plot contexts to enhance workflow efficiency for anime and game designers. The system is comprised of eight modules, three of which are dedicated to motion generation, while the remaining five are augmentation modules that ensure consistent fusion of motion sequences and system functionality. Central to the generation modules is our novel 3D scene-aware human-human interaction module, which addresses collision issues by synthesizing implicit 3D Signed Distance Function (SDF) points around motion spaces, thereby minimizing human-scene collisions without additional data collection costs. Complementing this, our locomotion and human-scene interaction modules leverage existing methods to enrich the system's motion generation capabilities. Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types, hand pose retrieval to enhance motion realism, motion collision revision to prevent human collisions, and 3D retargeting to ensure visual fidelity. Experimental evaluations validate the system's ability to generate high-quality, diverse, and physically realistic motions, underscoring its potential for advancing creative workflows. Project page: https://windvchen.github.io/Sitcom-Crafter.

Paper Structure

This paper contains 27 sections, 6 equations, 26 figures, 8 tables.

Figures (26)

  • Figure 1: Sitcom-Crafter supports various types of human motion generation within a 3D scene: human locomotion, human-scene interaction, and human-human interaction, represented by different colored toruses in the figure. Plots provided by the user effectively guide the generation.
  • Figure 2: The workflow of Sitcom-Crafter. The Sitcom-Crafter system consists of eight modules, three for motion generation and five for function enhancement. The arrows between modules indicate the workflow direction. The system supports generation guided by 3D scene structure and long plot context. The plot comprehension module is for interpreting the guiding context into recognizable commands and distributing them to the generation modules. The three generation modules synthesize different motion types: human-scene interaction, human locomotion, and human-human interaction. The motion synchronization module ensures motion consistency between the different generation modules. The hand pose retrieval module augments the motion results with hand motion. The collision revision module corrects frames where characters collide with each other. Finally, the motion retargeting module converts the plain parametric model into detailed 3D digital human assets.
  • Figure 3: Pipeline for constructing synthetic SDF conditions. Pipeline involves extracting walkable region from data, simulating random objects around this region, and distributing binary SDF points in 3D space. This process approximates a concrete scene, as depicted on the rightmost side.
  • Figure 4: Illustration of different canonicalization strategies. In this example, character A is initially canonicalized to the global coordinate origin, while character B is positioned relative to character A.
  • Figure 5: Visual comparisons of human-human interaction generative modules. Human-human interaction prompts are shown at the top, with some keywords highlighted in green. Due to the randomization inherent in the generation module, the resulting motions may appear in different positions. Camera positions are adjusted for optimal view. Each row, from left to right, shows screenshots captured at different times, progressing from earlier to more recent frames.
  • ...and 21 more figures