Online Multi-Contact Receding Horizon Planning via Value Function Approximation
Jiayi Wang, Sanghyun Kim, Teguh Santoso Lembono, Wenqian Du, Jaehyun Shim, Saeid Samadi, Ke Wang, Vladimir Ivan, Sylvain Calinon, Sethu Vijayakumar, Steve Tonneau
TL;DR
This work tackles online multi-contact locomotion planning by addressing the expensive value-function evaluation in receding horizon planning. It introduces two complementary approaches: (i) Receding Horizon Planning with Multiple Levels of Model Fidelity, which uses convex relaxations in the prediction horizon to accelerate planning, and (ii) Locally-Guided Receding Horizon Planning (LG-RHP), which learns an oracle to predict local objectives and guides short-horizon planning toward those targets. In simulation across moderate and large slopes and in real-world Talos experiments, LG-RHP achieves the best online convergence (approximately 95–98% cycles), enabling online planning in dynamically changing environments; multi-fidelity offers online capability with some convergence risk on highly dynamic terrains. The incremental training scheme for the oracle further improves robustness by adding corrective data from failure cases. Overall, the paper demonstrates practical, online multi-contact RHP for humanoids, with a clear trade-off between model fidelity and computation, and shows LG-RHP's potential to enable real-time adaptation on real robots.
Abstract
Planning multi-contact motions in a receding horizon fashion requires a value function to guide the planning with respect to the future, e.g., building momentum to traverse large obstacles. Traditionally, the value function is approximated by computing trajectories in a prediction horizon (never executed) that foresees the future beyond the execution horizon. However, given the non-convex dynamics of multi-contact motions, this approach is computationally expensive. To enable online Receding Horizon Planning (RHP) of multi-contact motions, we find efficient approximations of the value function. Specifically, we propose a trajectory-based and a learning-based approach. In the former, namely RHP with Multiple Levels of Model Fidelity, we approximate the value function by computing the prediction horizon with a convex relaxed model. In the latter, namely Locally-Guided RHP, we learn an oracle to predict local objectives for locomotion tasks, and we use these local objectives to construct local value functions for guiding a short-horizon RHP. We evaluate both approaches in simulation by planning centroidal trajectories of a humanoid robot walking on moderate slopes, and on large slopes where the robot cannot maintain static balance. Our results show that locally-guided RHP achieves the best computation efficiency (95\%-98.6\% cycles converge online). This computation advantage enables us to demonstrate online receding horizon planning of our real-world humanoid robot Talos walking in dynamic environments that change on-the-fly.
