HACL: History-Aware Curriculum Learning for Fast Locomotion
Prakhar Mishra, Amir Hossain Raj, Xuesu Xiao, Dinesh Manocha
TL;DR
The paper tackles fast, stable legged locomotion under uncertain, non-Markovian dynamics by introducing History-Aware Curriculum Learning (HACL). HACL uses a recurrent model to capture reward history through the hidden state $h_{t-1}$ and predicts per-bin rewards $\,hat{\mu}(b)$ for a discretized command space with BinID $x_t$, updating bin weights via $w_t(b)$ and guiding curriculum selection with a history-aware utility $u(b)$. A two-level setup decouples the meta-scheduler from the low-level PPO controller, training a predictor with loss $L(\psi) = \frac{1}{T} \sum_{t=1}^T (r_t - \hat{\mu}(b_t))^2$, which accelerates convergence to high-speed, stable gait patterns. The approach yields peak simulation speeds of $6.7$ m/s at a command of $7$ m/s and real-world Go1 speeds of $4.1 \pm 0.2$ m/s on diverse terrains, illustrating strong sim-to-real transfer and generalization across morphologies (Mini Cheetah, Go1, Go2). Overall, HACL improves agility, stability, and energy efficiency while reducing dependence on manual curriculum tuning, offering a practically impactful route to robust high-speed locomotion in unstructured environments.
Abstract
We address the problem of agile and rapid locomotion, a key characteristic of quadrupedal and bipedal robots. We present a new algorithm that maintains stability and generates high-speed trajectories by considering the temporal aspect of locomotion. Our formulation takes into account past information based on a novel history-aware curriculum Learning (HACL) algorithm. We model the history of joint velocity commands with respect to the observed linear and angular rewards using a recurrent neural net (RNN). The hidden state helps the curriculum learn the relationship between the forward linear velocity and angular velocity commands and the rewards over a given time-step. We validate our approach on the MIT Mini Cheetah,Unitree Go1, and Go2 robots in a simulated environment and on a Unitree Go1 robot in real-world scenarios. In practice, HACL achieves peak forward velocity of 6.7 m/s for a given command velocity of 7m/s and outperforms prior locomotion algorithms by nearly 20%.
