Generating Continual Human Motion in Diverse 3D Scenes
Aymen Mir, Xavier Puig, Angjoo Kanazawa, Gerard Pons-Moll
TL;DR
This work tackles continual human motion synthesis in diverse 3D scenes by decoupling scene reasoning from motion generation. It introduces action keypoints and a goal-centric canonical coordinate frame, enabling long-range motion using scene-agnostic mocap data. Two transformers, WalkNet and TransNet, generate walking and in-betweening transitions, trained entirely on mocap data and conditioned through anchor poses derived from keypoints. The approach generalizes across multiple real-world scene datasets and outperforms baselines in realism and scene constraint satisfaction, offering a scalable pathway for animator-guided motion in arbitrary environments.
Abstract
We introduce a method to synthesize animator guided human motion across 3D scenes. Given a set of sparse (3 or 4) joint locations (such as the location of a person's hand and two feet) and a seed motion sequence in a 3D scene, our method generates a plausible motion sequence starting from the seed motion while satisfying the constraints imposed by the provided keypoints. We decompose the continual motion synthesis problem into walking along paths and transitioning in and out of the actions specified by the keypoints, which enables long generation of motions that satisfy scene constraints without explicitly incorporating scene information. Our method is trained only using scene agnostic mocap data. As a result, our approach is deployable across 3D scenes with various geometries. For achieving plausible continual motion synthesis without drift, our key contribution is to generate motion in a goal-centric canonical coordinate frame where the next immediate target is situated at the origin. Our model can generate long sequences of diverse actions such as grabbing, sitting and leaning chained together in arbitrary order, demonstrated on scenes of varying geometry: HPS, Replica, Matterport, ScanNet and scenes represented using NeRFs. Several experiments demonstrate that our method outperforms existing methods that navigate paths in 3D scenes. For more results we urge the reader to watch our supplementary video available at: https://www.youtube.com/watch?v=0wZgsdyCT4A&t=1s
