PACE: Data-Driven Virtual Agent Interaction in Dense and Cluttered Environments
James Mullen, Dinesh Manocha
TL;DR
PACE addresses the challenge of placing motion-captured virtual agents into dense indoor scenes by combining a frame-weighted, interaction-aware preparation with a dual-objective optimization that jointly refines agent placement and per-frame motion. It leverages POSA-derived interaction cues and a frame-weighting scheme to focus optimization on the most semantically meaningful frames, and optimizes with $E_p$ (affordance+penetration) and $E_{alt}$ (pose+motion continuity) losses to maintain realism. Empirically, PACE outperforms baselines in non-collision and contact metrics and yields higher perceptual realism in user studies, with successful real-world AR integration via Microsoft HoloLens and a new large-scale dataset. The approach enables practical, physically plausible, and natural-looking agent interactions in cluttered environments, with potential impact on VR/AR, robotics, and immersive simulations.
Abstract
We present PACE, a novel method for modifying motion-captured virtual agents to interact with and move throughout dense, cluttered 3D scenes. Our approach changes a given motion sequence of a virtual agent as needed to adjust to the obstacles and objects in the environment. We first take the individual frames of the motion sequence most important for modeling interactions with the scene and pair them with the relevant scene geometry, obstacles, and semantics such that interactions in the agents motion match the affordances of the scene (e.g., standing on a floor or sitting in a chair). We then optimize the motion of the human by directly altering the high-DOF pose at each frame in the motion to better account for the unique geometric constraints of the scene. Our formulation uses novel loss functions that maintain a realistic flow and natural-looking motion. We compare our method with prior motion generating techniques and highlight the benefits of our method with a perceptual study and physical plausibility metrics. Human raters preferred our method over the prior approaches. Specifically, they preferred our method 57.1% of the time versus the state-of-the-art method using existing motions, and 81.0% of the time versus a state-of-the-art motion synthesis method. Additionally, our method performs significantly higher on established physical plausibility and interaction metrics. Specifically, we outperform competing methods by over 1.2% in terms of the non-collision metric and by over 18% in terms of the contact metric. We have integrated our interactive system with Microsoft HoloLens and demonstrate its benefits in real-world indoor scenes. Our project website is available at https://gamma.umd.edu/pace/.
