Non-Gaited Legged Locomotion with Monte-Carlo Tree Search and Supervised Learning
Ilyass Taouil, Lorenzo Amatucci, Majid Khadiv, Angela Dai, Victor Barasuol, Giulio Turrisi, Claudio Semini
TL;DR
This work tackles the combinatorial problem of gait planning for legged robots by casting gait sequence and timing optimization as a Markov Decision Process and solving it with Monte-Carlo Tree Search (MCTS). To achieve real-time performance, it combines MCTS with offline learning in a Value Function Network (VF) to bootstrap cost estimates, and it bootstraps online MPC rollouts through a hybrid update rule to maintain robustness. The authors perform an extensive parameter study and introduce a dataset-driven speed-up that enables real-time planning on hardware, validated on a 22 kg quadruped with varied terrains and perturbations; hardware experiments demonstrate disturbance rejection superior to fixed periodic gaits. The result is a practical framework that enables non-gaited legged locomotion at real-time rates, bridging sampling-based planning, supervised learning, and model-predictive control for robust locomotion in complex environments.
Abstract
Legged robots are able to navigate complex terrains by continuously interacting with the environment through careful selection of contact sequences and timings. However, the combinatorial nature behind contact planning hinders the applicability of such optimization problems on hardware. In this work, we present a novel approach that optimizes gait sequences and respective timings for legged robots in the context of optimization-based controllers through the use of sampling-based methods and supervised learning techniques. We propose to bootstrap the search by learning an optimal value function in order to speed-up the gait planning procedure making it applicable in real-time. To validate our proposed method, we showcase its performance both in simulation and on hardware using a 22 kg electric quadruped robot. The method is assessed on different terrains, under external perturbations, and in comparison to a standard control approach where the gait sequence is fixed a priori.
