Table of Contents
Fetching ...

Online Multi-Contact Receding Horizon Planning via Value Function Approximation

Jiayi Wang, Sanghyun Kim, Teguh Santoso Lembono, Wenqian Du, Jaehyun Shim, Saeid Samadi, Ke Wang, Vladimir Ivan, Sylvain Calinon, Sethu Vijayakumar, Steve Tonneau

TL;DR

This work tackles online multi-contact locomotion planning by addressing the expensive value-function evaluation in receding horizon planning. It introduces two complementary approaches: (i) Receding Horizon Planning with Multiple Levels of Model Fidelity, which uses convex relaxations in the prediction horizon to accelerate planning, and (ii) Locally-Guided Receding Horizon Planning (LG-RHP), which learns an oracle to predict local objectives and guides short-horizon planning toward those targets. In simulation across moderate and large slopes and in real-world Talos experiments, LG-RHP achieves the best online convergence (approximately 95–98% cycles), enabling online planning in dynamically changing environments; multi-fidelity offers online capability with some convergence risk on highly dynamic terrains. The incremental training scheme for the oracle further improves robustness by adding corrective data from failure cases. Overall, the paper demonstrates practical, online multi-contact RHP for humanoids, with a clear trade-off between model fidelity and computation, and shows LG-RHP's potential to enable real-time adaptation on real robots.

Abstract

Planning multi-contact motions in a receding horizon fashion requires a value function to guide the planning with respect to the future, e.g., building momentum to traverse large obstacles. Traditionally, the value function is approximated by computing trajectories in a prediction horizon (never executed) that foresees the future beyond the execution horizon. However, given the non-convex dynamics of multi-contact motions, this approach is computationally expensive. To enable online Receding Horizon Planning (RHP) of multi-contact motions, we find efficient approximations of the value function. Specifically, we propose a trajectory-based and a learning-based approach. In the former, namely RHP with Multiple Levels of Model Fidelity, we approximate the value function by computing the prediction horizon with a convex relaxed model. In the latter, namely Locally-Guided RHP, we learn an oracle to predict local objectives for locomotion tasks, and we use these local objectives to construct local value functions for guiding a short-horizon RHP. We evaluate both approaches in simulation by planning centroidal trajectories of a humanoid robot walking on moderate slopes, and on large slopes where the robot cannot maintain static balance. Our results show that locally-guided RHP achieves the best computation efficiency (95\%-98.6\% cycles converge online). This computation advantage enables us to demonstrate online receding horizon planning of our real-world humanoid robot Talos walking in dynamic environments that change on-the-fly.

Online Multi-Contact Receding Horizon Planning via Value Function Approximation

TL;DR

This work tackles online multi-contact locomotion planning by addressing the expensive value-function evaluation in receding horizon planning. It introduces two complementary approaches: (i) Receding Horizon Planning with Multiple Levels of Model Fidelity, which uses convex relaxations in the prediction horizon to accelerate planning, and (ii) Locally-Guided Receding Horizon Planning (LG-RHP), which learns an oracle to predict local objectives and guides short-horizon planning toward those targets. In simulation across moderate and large slopes and in real-world Talos experiments, LG-RHP achieves the best online convergence (approximately 95–98% cycles), enabling online planning in dynamically changing environments; multi-fidelity offers online capability with some convergence risk on highly dynamic terrains. The incremental training scheme for the oracle further improves robustness by adding corrective data from failure cases. Overall, the paper demonstrates practical, online multi-contact RHP for humanoids, with a clear trade-off between model fidelity and computation, and shows LG-RHP's potential to enable real-time adaptation on real robots.

Abstract

Planning multi-contact motions in a receding horizon fashion requires a value function to guide the planning with respect to the future, e.g., building momentum to traverse large obstacles. Traditionally, the value function is approximated by computing trajectories in a prediction horizon (never executed) that foresees the future beyond the execution horizon. However, given the non-convex dynamics of multi-contact motions, this approach is computationally expensive. To enable online Receding Horizon Planning (RHP) of multi-contact motions, we find efficient approximations of the value function. Specifically, we propose a trajectory-based and a learning-based approach. In the former, namely RHP with Multiple Levels of Model Fidelity, we approximate the value function by computing the prediction horizon with a convex relaxed model. In the latter, namely Locally-Guided RHP, we learn an oracle to predict local objectives for locomotion tasks, and we use these local objectives to construct local value functions for guiding a short-horizon RHP. We evaluate both approaches in simulation by planning centroidal trajectories of a humanoid robot walking on moderate slopes, and on large slopes where the robot cannot maintain static balance. Our results show that locally-guided RHP achieves the best computation efficiency (95\%-98.6\% cycles converge online). This computation advantage enables us to demonstrate online receding horizon planning of our real-world humanoid robot Talos walking in dynamic environments that change on-the-fly.
Paper Structure (27 sections, 13 figures, 6 tables)

This paper contains 27 sections, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Snapshots of our real-world experiments on Talos. Video is available at https://youtu.be/STBYJl7jvsg.
  • Figure 2: In Receding Horizon Planning (RHP), the planning horizon often consists of two parts: 1) execution horizon which plans the motion for immediate execution, and 2) prediction horizon (not executed) that looks into the future. The prediction horizon serves as an approximation of the value function, which guides the execution horizon by telling whether the decisions made in the execution horizon can facilitate the completion of the task or not.
  • Figure 3: a) Infinite-horizon problem that models the value function with the prediction horizon of an infinite length; b) Traditional approach which approximates the value function by considering a finite-length prediction horizon (from time $T$ to $T_p$). Nevertheless, traditional struggles to computes online, as the prediction horizon considers an accurate dynamics model (usually non-convex); c) Multi-fidelity , where we improve the computation efficiency by relaxing the model accuracy in the prediction horizon; d) Locally-Guided shortens the planning horizon by approximating the value function with a learned model.
  • Figure 4: Complexity comparison between traditional and our multi-fidelity . We use orange to denote higher computation complexity, while green means lower computation complexity. Our multi-fidelity formulation has reduced complexity due to the introduction of convex relaxations in the prediction horizon.
  • Figure 5: Schematics of the models used in the Prediction Horizon (PH): a) linear dynamics (Candidate 1); b) convex relaxation of angular momentum rate dynamics (dashed arrow) with rectangular contacts (Candidate 2); c) convex relaxation of angular momentum rate dynamics (dashed arrow) with point contacts (Candidate 3).
  • ...and 8 more figures