Table of Contents
Fetching ...

PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments

James Mullen, Dinesh Manocha

TL;DR

PACE addresses the challenge of placing motion-captured virtual agents into dense indoor scenes by combining a frame-weighted, interaction-aware preparation with a dual-objective optimization that jointly refines agent placement and per-frame motion. It leverages POSA-derived interaction cues and a frame-weighting scheme to focus optimization on the most semantically meaningful frames, and optimizes with $E_p$ (affordance+penetration) and $E_{alt}$ (pose+motion continuity) losses to maintain realism. Empirically, PACE outperforms baselines in non-collision and contact metrics and yields higher perceptual realism in user studies, with successful real-world AR integration via Microsoft HoloLens and a new large-scale dataset. The approach enables practical, physically plausible, and natural-looking agent interactions in cluttered environments, with potential impact on VR/AR, robotics, and immersive simulations.

Abstract

We present PACE, a novel method for modifying motion-captured virtual agents to interact with and move throughout dense, cluttered 3D scenes. Our approach changes a given motion sequence of a virtual agent as needed to adjust to the obstacles and objects in the environment. We first take the individual frames of the motion sequence most important for modeling interactions with the scene and pair them with the relevant scene geometry, obstacles, and semantics such that interactions in the agents motion match the affordances of the scene (e.g., standing on a floor or sitting in a chair). We then optimize the motion of the human by directly altering the high-DOF pose at each frame in the motion to better account for the unique geometric constraints of the scene. Our formulation uses novel loss functions that maintain a realistic flow and natural-looking motion. We compare our method with prior motion generating techniques and highlight the benefits of our method with a perceptual study and physical plausibility metrics. Human raters preferred our method over the prior approaches. Specifically, they preferred our method 57.1% of the time versus the state-of-the-art method using existing motions, and 81.0% of the time versus a state-of-the-art motion synthesis method. Additionally, our method performs significantly higher on established physical plausibility and interaction metrics. Specifically, we outperform competing methods by over 1.2% in terms of the non-collision metric and by over 18% in terms of the contact metric. We have integrated our interactive system with Microsoft HoloLens and demonstrate its benefits in real-world indoor scenes. Our project website is available at https://gamma.umd.edu/pace/.

PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments

TL;DR

PACE addresses the challenge of placing motion-captured virtual agents into dense indoor scenes by combining a frame-weighted, interaction-aware preparation with a dual-objective optimization that jointly refines agent placement and per-frame motion. It leverages POSA-derived interaction cues and a frame-weighting scheme to focus optimization on the most semantically meaningful frames, and optimizes with (affordance+penetration) and (pose+motion continuity) losses to maintain realism. Empirically, PACE outperforms baselines in non-collision and contact metrics and yields higher perceptual realism in user studies, with successful real-world AR integration via Microsoft HoloLens and a new large-scale dataset. The approach enables practical, physically plausible, and natural-looking agent interactions in cluttered environments, with potential impact on VR/AR, robotics, and immersive simulations.

Abstract

We present PACE, a novel method for modifying motion-captured virtual agents to interact with and move throughout dense, cluttered 3D scenes. Our approach changes a given motion sequence of a virtual agent as needed to adjust to the obstacles and objects in the environment. We first take the individual frames of the motion sequence most important for modeling interactions with the scene and pair them with the relevant scene geometry, obstacles, and semantics such that interactions in the agents motion match the affordances of the scene (e.g., standing on a floor or sitting in a chair). We then optimize the motion of the human by directly altering the high-DOF pose at each frame in the motion to better account for the unique geometric constraints of the scene. Our formulation uses novel loss functions that maintain a realistic flow and natural-looking motion. We compare our method with prior motion generating techniques and highlight the benefits of our method with a perceptual study and physical plausibility metrics. Human raters preferred our method over the prior approaches. Specifically, they preferred our method 57.1% of the time versus the state-of-the-art method using existing motions, and 81.0% of the time versus a state-of-the-art motion synthesis method. Additionally, our method performs significantly higher on established physical plausibility and interaction metrics. Specifically, we outperform competing methods by over 1.2% in terms of the non-collision metric and by over 18% in terms of the contact metric. We have integrated our interactive system with Microsoft HoloLens and demonstrate its benefits in real-world indoor scenes. Our project website is available at https://gamma.umd.edu/pace/.
Paper Structure (15 sections, 7 equations, 6 figures, 3 tables)

This paper contains 15 sections, 7 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: An overview of PACE. Using exiting methods, we first estimate human-scene interactions for a given scene and use those interactions to determine the frame weighting in the virtual agent motion. Through our novel contributions, we then utilize the scene geometry to optimize the motion of the virtual agent such that it interacts with the environment, matching both the interaction in the motion and the geometry of the scene based on our two interaction metrics: non-collision and contact.
  • Figure 2: An overview of our novel optimization process. We iteratively modify the motion of the virtual agent while trying to find the optimal location for it in the scene. This results in a motion that better fits the scene it is being placed in, for more natural-looking and physically plausible virtual agent motion. In this example, note the green circle highlighting changes to the arm geometry so it does not collide with the table in the final placement. In the initial motion, the arm penetrated the table when the person sat down.
  • Figure 3: Comparisons on placing the same virtual agent into the same scene across PACE, POSA-T, and PAAK. Note that two angles of each placement are provided. For the first placement, PACE is the only one that maximizes the interaction with the environment by tailoring the agents motion to maneuver it around the chair to the right of the table, placing the hand on it as a guide as a real person might. For POSA-T, the placement not only puts the agent perpendicular to the chair, but then has it wonder off the scene where its interactions make less sense. PAAK is in a middle state where it maintains contact with the environment but penetrates the chair and table and floats above the ground surface. For placement 2, PACE shows the most probable interaction given the virtual agent motion provided. It interacts with the chair and the bed, bracing its movement to the chair with the hand. It does however result in an awkward yet still valid seating position. For POSA-T, the second seating position is completely ignored, which makes sense due to its lack of frame weighting. PAAK finds a valid placement for the virtual agent but the hand placements are awkward and do not contact the scene in the way a real human might.
  • Figure 4: Comparisons on placing the same virtual agent into the same scene across PACE, and two ablation studies. Specifically, we attempt to remove the $\mathcal{L}_{pose}$ and $\mathcal{L}_{mot}$ terms and evaluate the results. Note that two view angles of each interactions are provided. For the first interaction, notice how PACE places the agent such that it has direct, non-penetrative contact with the table and the ground that matches the motion of the agent. Conversely, removing $\mathcal{L}_{pose}$resulted in the agent seeming to float out of the scene, and contort in awkward ways, i.e. a non-plausible interaction. Removing $\mathcal{L}_{mot}$ resulted in no visible placement as each individual mesh was separated and thrown far from the scene. For placement 2, PACE successfully navigates the virtual agent around the chair, and even results in contact between the hand and the chair for support. In contrast, the virtual agents motion without $\mathcal{L}_{pose}$ resulted in some awkwardly leaned poses and a lack of contact for some of the meshes. Moreover, the motion sequence without $\mathcal{L}_{mot}$ became separated into pieces, not resembling a valid motion at all. Overall, $\mathcal{L}_{pose}$ and $\mathcal{L}_{mot}$ improve the plausibility of the interactions.
  • Figure 5: PACE evaluated in a real-world scene using HoloLens. Notice how the virtual agent smoothly navigates around the obstacles in the real-world environment.
  • ...and 1 more figures