CHOICE: Coordinated Human-Object Interaction in Cluttered Environments for Pick-and-Place Actions
Jintao Lu, He Zhang, Yuting Ye, Takaaki Shiratori, Sebastian Starke, Taku Komura
TL;DR
This work tackles the problem of synthesizing realistic, long-horizon human-object interactions in cluttered environments by introducing CHOICE, a hierarchical system that combines a neural implicit trajectory planner, a bimanual task scheduler, and a DeepPhase-based controller. The trajectory planner uses a three-field implicit representation $\big(D_t, D_o, D_{toa}\big)$ with a time-of-arrival field learned from motion capture and an auto-decoder to generalize to unseen scenes, producing collision-free wrist trajectories. The DeepPhase controller employs a linear dynamical formulation in the phase latent space and a Kalman filter to robustly track goal-phase states, enabling smooth full-body coordination across hands and the hip. A dedicated navigation and scheduling module choreographs 2D path planning, motion matching, and bimanual task sequencing, while a large MoCap-based CHOICE dataset supports training. Empirical results show stronger motion realism, higher safety distances, and about a 96% success rate on unseen layouts, demonstrating substantial generalization to novel cluttered scenes and complex containers. The framework advances realistic, full-body interaction synthesis with meaningful implications for animation and robotics in real-world environments.
Abstract
Animating human-scene interactions such as pick-and-place tasks in cluttered, complex layouts is a challenging task, with objects of a wide variation of geometries and articulation under scenarios with various obstacles. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments as well as the poor availability of transition motions between different tasks, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal such as the target object selected by the user. Next, we develop a neural implicit planner that generates guidance hand trajectories under diverse object shape/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the character.Our system can produce a wide range of natural pick-and-place movements with respect to the geometry of objects, the articulation of containers and the layout of the objects in the scene.
