Table of Contents
Fetching ...

Non-Gaited Legged Locomotion with Monte-Carlo Tree Search and Supervised Learning

Ilyass Taouil, Lorenzo Amatucci, Majid Khadiv, Angela Dai, Victor Barasuol, Giulio Turrisi, Claudio Semini

TL;DR

This work tackles the combinatorial problem of gait planning for legged robots by casting gait sequence and timing optimization as a Markov Decision Process and solving it with Monte-Carlo Tree Search (MCTS). To achieve real-time performance, it combines MCTS with offline learning in a Value Function Network (VF) to bootstrap cost estimates, and it bootstraps online MPC rollouts through a hybrid update rule to maintain robustness. The authors perform an extensive parameter study and introduce a dataset-driven speed-up that enables real-time planning on hardware, validated on a 22 kg quadruped with varied terrains and perturbations; hardware experiments demonstrate disturbance rejection superior to fixed periodic gaits. The result is a practical framework that enables non-gaited legged locomotion at real-time rates, bridging sampling-based planning, supervised learning, and model-predictive control for robust locomotion in complex environments.

Abstract

Legged robots are able to navigate complex terrains by continuously interacting with the environment through careful selection of contact sequences and timings. However, the combinatorial nature behind contact planning hinders the applicability of such optimization problems on hardware. In this work, we present a novel approach that optimizes gait sequences and respective timings for legged robots in the context of optimization-based controllers through the use of sampling-based methods and supervised learning techniques. We propose to bootstrap the search by learning an optimal value function in order to speed-up the gait planning procedure making it applicable in real-time. To validate our proposed method, we showcase its performance both in simulation and on hardware using a 22 kg electric quadruped robot. The method is assessed on different terrains, under external perturbations, and in comparison to a standard control approach where the gait sequence is fixed a priori.

Non-Gaited Legged Locomotion with Monte-Carlo Tree Search and Supervised Learning

TL;DR

This work tackles the combinatorial problem of gait planning for legged robots by casting gait sequence and timing optimization as a Markov Decision Process and solving it with Monte-Carlo Tree Search (MCTS). To achieve real-time performance, it combines MCTS with offline learning in a Value Function Network (VF) to bootstrap cost estimates, and it bootstraps online MPC rollouts through a hybrid update rule to maintain robustness. The authors perform an extensive parameter study and introduce a dataset-driven speed-up that enables real-time planning on hardware, validated on a 22 kg quadruped with varied terrains and perturbations; hardware experiments demonstrate disturbance rejection superior to fixed periodic gaits. The result is a practical framework that enables non-gaited legged locomotion at real-time rates, bridging sampling-based planning, supervised learning, and model-predictive control for robust locomotion in complex environments.

Abstract

Legged robots are able to navigate complex terrains by continuously interacting with the environment through careful selection of contact sequences and timings. However, the combinatorial nature behind contact planning hinders the applicability of such optimization problems on hardware. In this work, we present a novel approach that optimizes gait sequences and respective timings for legged robots in the context of optimization-based controllers through the use of sampling-based methods and supervised learning techniques. We propose to bootstrap the search by learning an optimal value function in order to speed-up the gait planning procedure making it applicable in real-time. To validate our proposed method, we showcase its performance both in simulation and on hardware using a 22 kg electric quadruped robot. The method is assessed on different terrains, under external perturbations, and in comparison to a standard control approach where the gait sequence is fixed a priori.
Paper Structure (20 sections, 6 equations, 10 figures)

This paper contains 20 sections, 6 equations, 10 figures.

Figures (10)

  • Figure 1: Control block diagram of the proposed approach based on the previous work amatucci_mcts. The green block is executed at $12.5$ Hz, the blue block at $250$ Hz, and the orange block at $1K$ Hz.
  • Figure 2: MCTS search process amatucci_mcts augmented with learned value function evaluation. Selection: starting from the root node, a tree traversal is executed to find the node with the lowest cost. Expansion: the selected node's children that respect the MCTS constraints are added to the search tree. Simulation: the expanded nodes are assigned a prediction cost by solving several optimization problems and/or evaluating a learned value function network. Backpropagation: the assigned prediction costs are backpropagated recursively to update the node's ancestors' costs.
  • Figure 3: Comparison of the mean MPC tracking cost for different tree discretizations and increasing number of simulations while disturbing the system along different swing phases with a force of $150$ N for $100$ ms.
  • Figure 4: Comparison of the mean MCTS computation time for different tree discretizations and increasing number of simulations while disturbing the system along different swing phases with a force of $150$ N for $100$ ms.
  • Figure 5: Comparison of the mean MPC tracking cost for a tree discretization $dt$ of $0.08$ s, $120$ MPC rollouts, and different tree horizons while disturbing the system along different swing phases with a force of $150$ N for $100$ ms.
  • ...and 5 more figures