Table of Contents
Fetching ...

NaviGait: Navigating Dynamically Feasible Gait Libraries using Deep Reinforcement Learning

Neil C. Janwani, Varun Madabushi, Maegan Tucker

TL;DR

NaviGait addresses the challenge of achieving robust, natural dynamic locomotion by coupling offline, optimized gait libraries with a residual reinforcement learning controller. Gait transitions are achieved via smooth Bézier interpolation, while the policy outputs joint residuals $\Delta q$ and velocity residuals $\Delta v$ to produce final targets $\hat{q}^d$ and $\hat{v}^d$, combining planning and learning. This architecture simplifies reward design, accelerates training, and preserves the look and stability of reference motions. Hardware experiments on the BRUCE humanoid demonstrate robust disturbance rejection and faster convergence while enabling stylistic gait customization through library switching.

Abstract

Reinforcement learning (RL) has emerged as a powerful method to learn robust control policies for bipedal locomotion. Yet, it can be difficult to tune desired robot behaviors due to unintuitive and complex reward design. In comparison, offline trajectory optimization methods, like Hybrid Zero Dynamics, offer more tuneable, interpretable, and mathematically grounded motion plans for high-dimensional legged systems. However, these methods often remain brittle to real-world disturbances like external perturbations. In this work, we present NaviGait, a hierarchical framework that combines the structure of trajectory optimization with the adaptability of RL for robust and intuitive locomotion control. NaviGait leverages a library of offline-optimized gaits and smoothly interpolates between them to produce continuous reference motions in response to high-level commands. The policy provides both joint-level and velocity command residual corrections to modulate and stabilize the reference trajectories in the gait library. One notable advantage of NaviGait is that it dramatically simplifies reward design by encoding rich motion priors from trajectory optimization, reducing the need for finely tuned shaping terms and enabling more stable and interpretable learning. Our experimental results demonstrate that NaviGait enables faster training compared to conventional and imitation-based RL, and produces motions that remain closest to the original reference. Overall, by decoupling high-level motion generation from low-level correction, NaviGait offers a more scalable and generalizable approach for achieving dynamic and robust locomotion.

NaviGait: Navigating Dynamically Feasible Gait Libraries using Deep Reinforcement Learning

TL;DR

NaviGait addresses the challenge of achieving robust, natural dynamic locomotion by coupling offline, optimized gait libraries with a residual reinforcement learning controller. Gait transitions are achieved via smooth Bézier interpolation, while the policy outputs joint residuals and velocity residuals to produce final targets and , combining planning and learning. This architecture simplifies reward design, accelerates training, and preserves the look and stability of reference motions. Hardware experiments on the BRUCE humanoid demonstrate robust disturbance rejection and faster convergence while enabling stylistic gait customization through library switching.

Abstract

Reinforcement learning (RL) has emerged as a powerful method to learn robust control policies for bipedal locomotion. Yet, it can be difficult to tune desired robot behaviors due to unintuitive and complex reward design. In comparison, offline trajectory optimization methods, like Hybrid Zero Dynamics, offer more tuneable, interpretable, and mathematically grounded motion plans for high-dimensional legged systems. However, these methods often remain brittle to real-world disturbances like external perturbations. In this work, we present NaviGait, a hierarchical framework that combines the structure of trajectory optimization with the adaptability of RL for robust and intuitive locomotion control. NaviGait leverages a library of offline-optimized gaits and smoothly interpolates between them to produce continuous reference motions in response to high-level commands. The policy provides both joint-level and velocity command residual corrections to modulate and stabilize the reference trajectories in the gait library. One notable advantage of NaviGait is that it dramatically simplifies reward design by encoding rich motion priors from trajectory optimization, reducing the need for finely tuned shaping terms and enabling more stable and interpretable learning. Our experimental results demonstrate that NaviGait enables faster training compared to conventional and imitation-based RL, and produces motions that remain closest to the original reference. Overall, by decoupling high-level motion generation from low-level correction, NaviGait offers a more scalable and generalizable approach for achieving dynamic and robust locomotion.

Paper Structure

This paper contains 19 sections, 5 equations, 9 figures.

Figures (9)

  • Figure 1: NaviGait simplifies reward design for bipedal locomotion by seamlessly integrating offline-generated gait libraries with reinforcement learning. At every inference step, NaviGait (1) chooses the desired reference trajectory, (2) smoothly and continuously transitions between the current motion and the updated velocity reference motion, and (3) provides joint-level corrections for stabilization. We demonstrate this approach for velocity tracking and disturbance rejection tasks both in simulation and on hardware.
  • Figure 2: Illustration of the BRUCE kinematic model, its simulation, and the robot hardware.
  • Figure 3: The diagram illustrates the procedure for smoothly interpolating between different gait references. Importantly, the implementation allows for continuous reference tracking by recursively repeating the illustrated process for some new phasing parameterization $\hat{\tau} \in [0,1]$.
  • Figure 4: Architecture of the NaviGait framework.
  • Figure 5: All three policies are effective at tracking a time-varying ovular velocity command. Note that lateral velocity tracking ($v_y$) is expected to have large deviations since the BRUCE robot has underactuated dynamics in the frontal plane. Also of interest is the observation that NaviGait and Imitation RL exhibit less overall drift compared to Canonical RL.
  • ...and 4 more figures