Table of Contents
Fetching ...

FootstepNet: an Efficient Actor-Critic Method for Fast On-line Bipedal Footstep Planning and Forecasting

Clément Gaspard, Grégoire Passault, Mélodie Daniel, Olivier Ly

TL;DR

FootstepNet tackles fast on-line bipedal footstep planning by formulating it as a continuous-action actor-critic DRL problem, enabling a lightweight planner (actor) and a fast forecaster (critic). The method eliminates the need for discrete footstep sets and achieves on-board inference times in the tens of microseconds, validated through simulation and a real RoboCup deployment on a kid-size humanoid. A separate forecasting component estimates the number of steps to reach local targets, supporting rapid upstream decisions. The combination yields efficient local navigation with obstacle avoidance and demonstrates practical impact in competitive robotics, offering a scalable building block for integrated locomotion control.

Abstract

Designing a humanoid locomotion controller is challenging and classically split up in sub-problems. Footstep planning is one of those, where the sequence of footsteps is defined. Even in simpler environments, finding a minimal sequence, or even a feasible sequence, yields a complex optimization problem. In the literature, this problem is usually addressed by search-based algorithms (e.g. variants of A*). However, such approaches are either computationally expensive or rely on hand-crafted tuning of several parameters. In this work, at first, we propose an efficient footstep planning method to navigate in local environments with obstacles, based on state-of-the art Deep Reinforcement Learning (DRL) techniques, with very low computational requirements for on-line inference. Our approach is heuristic-free and relies on a continuous set of actions to generate feasible footsteps. In contrast, other methods necessitate the selection of a relevant discrete set of actions. Second, we propose a forecasting method, allowing to quickly estimate the number of footsteps required to reach different candidates of local targets. This approach relies on inherent computations made by the actor-critic DRL architecture. We demonstrate the validity of our approach with simulation results, and by a deployment on a kid-size humanoid robot during the RoboCup 2023 competition.

FootstepNet: an Efficient Actor-Critic Method for Fast On-line Bipedal Footstep Planning and Forecasting

TL;DR

FootstepNet tackles fast on-line bipedal footstep planning by formulating it as a continuous-action actor-critic DRL problem, enabling a lightweight planner (actor) and a fast forecaster (critic). The method eliminates the need for discrete footstep sets and achieves on-board inference times in the tens of microseconds, validated through simulation and a real RoboCup deployment on a kid-size humanoid. A separate forecasting component estimates the number of steps to reach local targets, supporting rapid upstream decisions. The combination yields efficient local navigation with obstacle avoidance and demonstrates practical impact in competitive robotics, offering a scalable building block for integrated locomotion control.

Abstract

Designing a humanoid locomotion controller is challenging and classically split up in sub-problems. Footstep planning is one of those, where the sequence of footsteps is defined. Even in simpler environments, finding a minimal sequence, or even a feasible sequence, yields a complex optimization problem. In the literature, this problem is usually addressed by search-based algorithms (e.g. variants of A*). However, such approaches are either computationally expensive or rely on hand-crafted tuning of several parameters. In this work, at first, we propose an efficient footstep planning method to navigate in local environments with obstacles, based on state-of-the art Deep Reinforcement Learning (DRL) techniques, with very low computational requirements for on-line inference. Our approach is heuristic-free and relies on a continuous set of actions to generate feasible footsteps. In contrast, other methods necessitate the selection of a relevant discrete set of actions. Second, we propose a forecasting method, allowing to quickly estimate the number of footsteps required to reach different candidates of local targets. This approach relies on inherent computations made by the actor-critic DRL architecture. We demonstrate the validity of our approach with simulation results, and by a deployment on a kid-size humanoid robot during the RoboCup 2023 competition.
Paper Structure (16 sections, 5 equations, 9 figures, 2 tables)

This paper contains 16 sections, 5 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: An example of FootstepNet use -- Step 1: A bipedal robot must score a goal while minimizing its number of steps. To do this, we arbitrarily choose $n_{alt}$ placement possibilities (here $n_{alt}=3$) which all allow scoring. Step 2: Forecasting allows choosing from the $n_{alt}$ possibilities, the one that requires the fewest steps. Step 3: The planner compute all the steps in order to go to the position chosen by the forecast. Step 4: The step sequence is executed on the real robot.
  • Figure 2: Locomotion tasks seen as a hierarchy of problems with different horizons. Autonomous decision computes a path to navigate globally and an intermediate target to reach. Footstep planning computes a sequence of footsteps, that ensures the avoidance of the local obstacle. Walk Pattern Generator (WPG) then computes a Center of Mass (CoM) trajectory and use a whole-body controller to follow it.
  • Figure 3: Example of footsteps generated by FootstepNet planning for the three possible goals of Fig. \ref{['fig:catch-eye']} -- The target positions are close to each other, however the generated footsteps to reach them use different complex maneuvers.
  • Figure 4: Parametrization of a footstep displacement $(\Delta x, \Delta y, \Delta \theta)$. The displacement is a pose expressed in the frame of the support foot, with an implicit offset of $f_{dist}$ in the $y$ direction.
  • Figure 5: Overview of the proposed method -- First, offline training is carried out during which the agent learns the policy by interacting with the RL environment. During online inference, we then use the trained networks to, on the one hand, estimate the number of steps using the critic and, on the other hand, to determine the sequence of steps to be performed using the actor.
  • ...and 4 more figures