Table of Contents
Fetching ...

Physics-based Scene Layout Generation from Human Motion

Jianan Li, Tao Huang, Qingxu Zhu, Tien-Tsin Wong

TL;DR

INFERACT presents a physics-based framework for automatic scene layout generation conditioned on a given human motion, by jointly optimizing a motion imitation controller and a scene layout generator to maximize a motion-tracking objective under physical constraints. The method enforces physical plausibility through a contact constraint and uses pseudo contact labels and pose priors to guide object placement, trained with PPO in a physics simulator. It demonstrates superior physical plausibility, diversity of scene configurations, and generalization to outdoor motions on motions from SAMP, PROX, and vaulting datasets, outperforming kinematics-based baselines like SUMMON and MIME. The approach enables automatic, realistic human-scene interactions with potential impact on 3D animation pipelines, though it relies on rigid-body interactions and could be extended to broader interaction categories in future work.

Abstract

Creating scenes for captured motions that achieve realistic human-scene interaction is crucial for 3D animation in movies or video games. As character motion is often captured in a blue-screened studio without real furniture or objects in place, there may be a discrepancy between the planned motion and the captured one. This gives rise to the need for automatic scene layout generation to relieve the burdens of selecting and positioning furniture and objects. Previous approaches cannot avoid artifacts like penetration and floating due to the lack of physical constraints. Furthermore, some heavily rely on specific data to learn the contact affordances, restricting the generalization ability to different motions. In this work, we present a physics-based approach that simultaneously optimizes a scene layout generator and simulates a moving human in a physics simulator. To attain plausible and realistic interaction motions, our method explicitly introduces physical constraints. To automatically recover and generate the scene layout, we minimize the motion tracking errors to identify the objects that can afford interaction. We use reinforcement learning to perform a dual-optimization of both the character motion imitation controller and the scene layout generator. To facilitate the optimization, we reshape the tracking rewards and devise pose prior guidance obtained from our estimated pseudo-contact labels. We evaluate our method using motions from SAMP and PROX, and demonstrate physically plausible scene layout reconstruction compared with the previous kinematics-based method.

Physics-based Scene Layout Generation from Human Motion

TL;DR

INFERACT presents a physics-based framework for automatic scene layout generation conditioned on a given human motion, by jointly optimizing a motion imitation controller and a scene layout generator to maximize a motion-tracking objective under physical constraints. The method enforces physical plausibility through a contact constraint and uses pseudo contact labels and pose priors to guide object placement, trained with PPO in a physics simulator. It demonstrates superior physical plausibility, diversity of scene configurations, and generalization to outdoor motions on motions from SAMP, PROX, and vaulting datasets, outperforming kinematics-based baselines like SUMMON and MIME. The approach enables automatic, realistic human-scene interactions with potential impact on 3D animation pipelines, though it relies on rigid-body interactions and could be extended to broader interaction categories in future work.

Abstract

Creating scenes for captured motions that achieve realistic human-scene interaction is crucial for 3D animation in movies or video games. As character motion is often captured in a blue-screened studio without real furniture or objects in place, there may be a discrepancy between the planned motion and the captured one. This gives rise to the need for automatic scene layout generation to relieve the burdens of selecting and positioning furniture and objects. Previous approaches cannot avoid artifacts like penetration and floating due to the lack of physical constraints. Furthermore, some heavily rely on specific data to learn the contact affordances, restricting the generalization ability to different motions. In this work, we present a physics-based approach that simultaneously optimizes a scene layout generator and simulates a moving human in a physics simulator. To attain plausible and realistic interaction motions, our method explicitly introduces physical constraints. To automatically recover and generate the scene layout, we minimize the motion tracking errors to identify the objects that can afford interaction. We use reinforcement learning to perform a dual-optimization of both the character motion imitation controller and the scene layout generator. To facilitate the optimization, we reshape the tracking rewards and devise pose prior guidance obtained from our estimated pseudo-contact labels. We evaluate our method using motions from SAMP and PROX, and demonstrate physically plausible scene layout reconstruction compared with the previous kinematics-based method.
Paper Structure (29 sections, 11 equations, 9 figures, 1 table)

This paper contains 29 sections, 11 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Given a human motion sequence and an interacting object set, our framework performs joint optimization for the two composed parts: a motion imitation controller that controls the simulated character and a scene layout generator that configures the objects in the simulation environment.
  • Figure 2: The motion imitation controller is trained to track the reference motion within a physics simulator. The frame value network, guided by the motion imitation controller, predicts the frame value for a given query pose and contact condition. Pseudo contact labels are derived by identifying frames with higher estimated contact frame values compared to non-contact frame values. Subsequently, pose priors and the number of contacting objects can be obtained through clustering of the contact human poses. The scene layout generator policy takes the motion index and the number of objects as inputs and predicts a mixed distribution of selected object indices and their corresponding placements, with the guidance of pose priors.
  • Figure 3: Illustrations of our generated scenes for three different motions, with the object placements obtained by SUMMON ye2022scene for comparison. Our method generates reasonable and plausible scenes compared with SUMMON.
  • Figure 4: Qualitative comparison with MIME yi2023mime. The results exclusively focus on the contacting objects within the scene. Our method demonstrates enhanced physical plausibility in object placements, while MIME occasionally encounters difficulties in generating scenes with satisfactory human-scene interaction, even after employing scene refinement.
  • Figure 5: Visualizations of diverse object selections generated by our method for a sitting motion. The selection probability of each chosen object is indicated above the figures. This example demonstrates the capability of INFERACT to generate diverse results for chair choices (sub-figure 1 to 3) and its ability to screen out inappropriate objects like tables (rightmost).
  • ...and 4 more figures