Hierarchical visuomotor control of humanoids
Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Greg Wayne
TL;DR
This work tackles the problem of integrated visuomotor control for high-DoF humanoids by proposing a hierarchical architecture in which a memory- and vision-based high-level controller coordinates multiple low-level motor controllers pretrained from motion-capture data. It systematically compares steerable, switching, and cold-switching control paradigms for low-level skills, showing that discrete selection among control fragments provides the strongest performance on vision-based locomotion and memory tasks. Key findings include successful Go-to-target, Walls, Gaps, Forage, and Heterogeneous Forage tasks in MuJoCo, with memory-enabled behavior emerging in the memory task, and analyses indicating that the high-level policy leverages perceptual cues like ball borders and walls. The approach scales motor skill reuse without heavy hand-engineering, marking a step toward flexible, vision-guided, memory-augmented humanoids, while highlighting ongoing challenges such as automation of transitions and reduction of movement artifacts.
Abstract
We aim to build complex humanoid agents that integrate perception, motor control, and memory. In this work, we partly factor this problem into low-level motor control from proprioception and high-level coordination of the low-level skills informed by vision. We develop an architecture capable of surprisingly flexible, task-directed motor control of a relatively high-DoF humanoid body by combining pre-training of low-level motor controllers with a high-level, task-focused controller that switches among low-level sub-policies. The resulting system is able to control a physically-simulated humanoid body to solve tasks that require coupling visual perception from an unstabilized egocentric RGB camera during locomotion in the environment. For a supplementary video link, see https://youtu.be/7GISvfbykLE .
