REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Jialong Liu; Dehan Shen; Yanbo Wen; Zeyu Jiang; Changhao Chen

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Jialong Liu, Dehan Shen, Yanbo Wen, Zeyu Jiang, Changhao Chen

Abstract

Extreme legged parkour demands rapid terrain assessment and precise foot placement under highly dynamic conditions. While recent learning-based systems achieve impressive agility, they remain fundamentally fragile to perceptual degradation, where even brief visual noise or latency can cause catastrophic failure. To overcome this, we propose Robust Extreme Agility Learning (REAL), an end-to-end framework for reliable parkour under sensory corruption. Instead of relying on perfectly clean perception, REAL tightly couples vision, proprioceptive history, and temporal memory. We distill a cross-modal teacher policy into a deployable student equipped with a FiLM-modulated Mamba backbone to actively filter visual noise and build short-term terrain memory actively. Furthermore, a physics-guided Bayesian state estimator enforces rigid-body consistency during high-impact maneuvers. Validated on a Unitree Go2 quadruped, REAL successfully traverses extreme obstacles even with a 1-meter visual blind zone, while strictly satisfying real-time control constraints with a bounded 13.1 ms inference time.

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Abstract

Paper Structure (15 sections, 13 equations, 8 figures, 6 tables)

This paper contains 15 sections, 13 equations, 8 figures, 6 tables.

Introduction
Related Works
Legged Locomotion
Robotic Parkour
Robust Sequence Modeling and Estimation
Method
Spatio-Temporal Policy Learning
Physics-Guided Filtering
Consistency-Aware Loss Gating Strategy
Experiment
Experimental Setup
Main Experiments
Real-Robot Deployment
Ablation Study
Conclusion

Figures (8)

Figure 1: Robust extreme parkour with proposed REAL framework. The robot successfully chains highly dynamic maneuvers across complex terrains with nominal vision (green box), and maintains stable locomotion even under severe visual degradation (red box).
Figure 2: System architecture of REAL. Stage 1(Privileged Teacher Policy Learning) trains a privileged teacher policy via Proprioception-Terrain Associated Reasoning. Stage 2(Distillation Student Policy Learning) distills a deployable student policy using an onboard Mamba-FiLM spatial-temporal backbone and a physics-guided filtering, stabilized by a consistency-aware loss gating strategy.
Figure 3: Left. Snapshots of the REAL policy executing dynamic manoeuvres across extreme terrains. Right. Physical hardware setup. The REAL policy has been deployed on a Unitree Go2 quadrupedal robot.
Figure 4: Simulation results on complex parkour terrains featuring a 1 m vision-masked blind zone (setup detailed in Table \ref{['tab:memory_blind_zone']}). REAL successfully traverses all terrains while maintaining kinematic stability.
Figure 5: Real-world extreme blind test. Left: Baseline fails immediately upon losing visual input. Right: REAL utilizes proprioceptive history to maintain environmental memory, enabling robust blind traversal across unstructured obstacles.
...and 3 more figures

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Abstract

REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering

Authors

Abstract

Table of Contents

Figures (8)