Learning feasible transitions for efficient contact planning

Rikhat Akizhanov; Victor Dhédin; Majid Khadiv; Ivan Laptev

Learning feasible transitions for efficient contact planning

Rikhat Akizhanov, Victor Dhédin, Majid Khadiv, Ivan Laptev

TL;DR

This work tackles dynamic contact planning for quadrupedal locomotion in extremely constrained stepping-stone environments by marrying Monte Carlo Tree Search with learned components. A dynamic feasibility classifier $c$ and a target adjustment network $t$ are trained offline from NMPC data to prune infeasible transitions and compensate low-level controller inaccuracies, enabling faster online search. Empirical results on a Go2 quadruped show that dynamic pruning and target adjustment substantially reduce NMPC calls and search time while increasing success rates, outperforming a state-of-the-art RL approach on the smallest stones. The approach demonstrates that integrating learning-based pruning and control compensation into a model-based planning loop yields robust, efficient navigation in sparse terrains with potential for real-robot deployment.

Abstract

In this paper, we propose an efficient contact planner for quadrupedal robots to navigate in extremely constrained environments such as stepping stones. The main difficulty in this setting stems from the mixed nature of the problem, namely discrete search over the steppable patches and continuous trajectory optimization. To speed up the discrete search, we study the properties of the transitions from one contact mode to another. In particular, we propose to learn a dynamic feasibility classifier and a target adjustment network. The former predicts if a contact transition between two contact modes is dynamically feasible. The latter is trained to compensate for misalignment in reaching a desired set of contact locations, due to imperfections of the low-level control. We integrate these learned networks in a Monte Carlo Tree Search (MCTS) contact planner. Our simulation results demonstrate that training these networks with offline data significantly speeds up the online search process and improves its accuracy.

Learning feasible transitions for efficient contact planning

TL;DR

and a target adjustment network

are trained offline from NMPC data to prune infeasible transitions and compensate low-level controller inaccuracies, enabling faster online search. Empirical results on a Go2 quadruped show that dynamic pruning and target adjustment substantially reduce NMPC calls and search time while increasing success rates, outperforming a state-of-the-art RL approach on the smallest stones. The approach demonstrates that integrating learning-based pruning and control compensation into a model-based planning loop yields robust, efficient navigation in sparse terrains with potential for real-robot deployment.

Abstract

Paper Structure (17 sections, 4 equations, 3 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 4 equations, 3 figures, 2 tables, 2 algorithms.

Introduction
Problem formulation
Notations
Contact planning with MCTS
Method
Dataset collection
Feasibility classifier and state predictor
Compensating for the low-level controller inaccuracies
Heuristics and reward
Results
Training the classifier and state predictor networks
Experimental Setup
MCTS Contact planner with target adjustment.
MCTS Contact Planner with Dynamic Pruning.
Comparison with state-of-the-art RL approach
...and 2 more sections

Figures (3)

Figure 1: Framework overview with the block diagram. In the MCTS (Section \ref{['sec:contact_planning_MCTS']}), a state predictor and a feasibility classifier are jointly called to prune for paths that are likely not to be dynamically feasible (Section \ref{['sec:classifier']}). While running the NMPC in simulation to check for the dynamic feasibility of a contact plan, target contact locations given to the NMPC are adjusted to compensate for the low-level controller inaccuracies (Section \ref{['sec:offsets']}).
Figure 2: Illustration of current, target and achieved contacts for a jump with a quadruped robot. The NMPC doesn't precisely achieve target position. The residual is defined as the difference between target and achieved contacts.
Figure 3: Performance of the search for two different gaits. adjust: experiments with target adjustment. kin: experiments with kinematic pruning only. dyn: experiments with kinematic and dynamic pruning. Note that we use the baseline heuristic for those experiments ($\alpha=0$, $\beta=0$). Results of zhang2023learning are added for comparison in trotting gate.

Learning feasible transitions for efficient contact planning

TL;DR

Abstract

Learning feasible transitions for efficient contact planning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)