Table of Contents
Fetching ...

State Estimation Transformers for Agile Legged Locomotion

Chen Yu, Yichu Yang, Tianlin Liu, Yangwei You, Mingliang Zhou, Diyun Xiang

TL;DR

Results show that SET can outperform other methods in estimation accuracy and transferability in the simulation as well as success rates of jumping and triggering a recovery controller in the real world, suggesting the superiority of such a Transformer-based explicit state estimator in highly dynamic locomotion tasks.

Abstract

We propose a state estimation method that can accurately predict the robot's privileged states to push the limits of quadruped robots in executing advanced skills such as jumping in the wild. In particular, we present the State Estimation Transformers (SET), an architecture that casts the state estimation problem as conditional sequence modeling. SET outputs the robot states that are hard to obtain directly in the real world, such as the body height and velocities, by leveraging a causally masked Transformer. By conditioning an autoregressive model on the robot's past states, our SET model can predict these privileged observations accurately even in highly dynamic locomotions. We evaluate our methods on three tasks -- running jumping, running backflipping, and running sideslipping -- on a low-cost quadruped robot, Cyberdog2. Results show that SET can outperform other methods in estimation accuracy and transferability in the simulation as well as success rates of jumping and triggering a recovery controller in the real world, suggesting the superiority of such a Transformer-based explicit state estimator in highly dynamic locomotion tasks.

State Estimation Transformers for Agile Legged Locomotion

TL;DR

Results show that SET can outperform other methods in estimation accuracy and transferability in the simulation as well as success rates of jumping and triggering a recovery controller in the real world, suggesting the superiority of such a Transformer-based explicit state estimator in highly dynamic locomotion tasks.

Abstract

We propose a state estimation method that can accurately predict the robot's privileged states to push the limits of quadruped robots in executing advanced skills such as jumping in the wild. In particular, we present the State Estimation Transformers (SET), an architecture that casts the state estimation problem as conditional sequence modeling. SET outputs the robot states that are hard to obtain directly in the real world, such as the body height and velocities, by leveraging a causally masked Transformer. By conditioning an autoregressive model on the robot's past states, our SET model can predict these privileged observations accurately even in highly dynamic locomotions. We evaluate our methods on three tasks -- running jumping, running backflipping, and running sideslipping -- on a low-cost quadruped robot, Cyberdog2. Results show that SET can outperform other methods in estimation accuracy and transferability in the simulation as well as success rates of jumping and triggering a recovery controller in the real world, suggesting the superiority of such a Transformer-based explicit state estimator in highly dynamic locomotion tasks.

Paper Structure

This paper contains 18 sections, 13 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: State Estimation Transformer architecture in our training pipelines of jumping policies. First, we train a Jump Policy using the task rewards $r_{\text{task}}$ from the Environments and the style reward $r_{\text{style}}$ calculated by the Adversarial Motion Priors; then we train a State Estimation Transformer that leverage a GPT model (green blocks represent embedding and position encoding and yellow blocks represent decoders) to predict the privileged observations. To further exploit the benefit of an accurate explicit state estimator, the predicted privileged observations are used as one of the conditions of triggering a built-in Reset Policy.
  • Figure 2: Training curves of different algorithms on three jumping gaits. We compare the training curves using the ground-true full observations (Ground-true), only non-privileged observations (w/o Estimator), 20 historical stacked non-privileged observations (Implicit), and the results using our SET and MLP estimators. Results show that for highly dynamic locomotion, the privileged observations such as robot velocity and height can significantly increase the efficiency and robustness of policy training; and our deployable method can reproduce the results trained with ground-true privileged observations.
  • Figure 3: Prediction error of the estimators that are trained on different data and applied on different tasks. Each pair of training dataset / application task correspond to four adjacent items in the heatmaps, representing errors in the estimation of $h$, $v_x$, $v_y$, and $v_z$. We show the results from SET and MLP to compare their abilities in generalization, while SET outperforms MLP in 2/3 (32/48, highlighted in red) cases. These results suggest the abilities of a trained explicit estimator to be applied to another task; and demonstrate the superiority of SET over MLP in explicit state estimation.
  • Figure 4: Recovering from a failed jumping. The estimated robot height from the estimator is used as one of the conditions for triggering the reset policy.