KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots

Xiaoyi Wei; Peng Zhai; Jiaxin Tu; Yueqi Zhang; Yuqi Li; Zonghao Zhang; Hu Zhou; Lihua Zhang

KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots

Xiaoyi Wei, Peng Zhai, Jiaxin Tu, Yueqi Zhang, Yuqi Li, Zonghao Zhang, Hu Zhou, Lihua Zhang

Abstract

With advances in reinforcement learning and imitation learning, quadruped robots can acquire diverse skills within a single policy by imitating multiple skill-specific datasets. However, the lack of datasets on complex terrains limits the ability of such multi-skill policies to generalize effectively in unstructured environments. Inspired by animation, we adopt keyframes as minimal and universal skill representations, relaxing dataset constraints and enabling the integration of terrain adaptability with skill diversity. We propose Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning (KiRAS), an end-to-end framework for acquiring and transitioning between diverse skill primitives on complex terrains. KiRAS first learns diverse skills on flat terrain through keyframe-guided self-imitation, eliminating the need for expert datasets; then continues training the same policy network on rough terrains to enhance robustness. To eliminate catastrophic forgetting, a proficiency-based Skill Initialization Technique is introduced. Experiments on Solo-8 and Unitree Go1 robots show that KiRAS enables robust skill acquisition and smooth transitions across challenging terrains. This framework demonstrates its potential as a lightweight platform for multi-skill generation and dataset collection. It further enables flexible skill transitions that enhance locomotion on challenging terrains.

KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots

Abstract

Paper Structure (36 sections, 7 equations, 9 figures, 7 tables)

This paper contains 36 sections, 7 equations, 9 figures, 7 tables.

INTRODUCTION
RELATED WORK
Robust DRL Controller without External Sensors
Multi-skill Learning for Legged Robots
METHOD
End-to-end Adaptive Multi-skill Learning Framework
Keyframe-guided Skill Learning
Rewards and Multi-critic Architecture
Environment Context Estimator
Skill Initialization Technique
Training Details
Terrain Finetuning
State Initialization
EXPERIMENTS
Experimental Setup
...and 21 more sections

Figures (9)

Figure 1: Deployment of KiRAS on Solo-8 and Unitree Go1 robots across diverse environments. (a) Solo-8 robot traverses various obstacles by flexibly switching learned skills. (b) Solo-8 robot exploits crawl for concealment and stilt for expanded field of view. (c) In outdoor unstructured environments, Solo-8 robot demonstrates strong robustness. (d) Unitree Go1 robot acquires bipedal skills to climb a 10 cm step.
Figure 2: Training pipeline. KiRAS consists of an end-to-end adaptive multi-skill learning framework and a skill initialization module. The former includes a keyframe-guided skill learning module (used only in skill learning stage), a proprioception estimator, and a PPO actor-critic architecture. The policy outputs joint position targets, which are converted to torques via a PD controller. The Skill Initialization Technique is employed at environment reset to prevent overfitting to simpler skills. The purple box illustrates different skills being trained on different terrains. During deployment, only the networks in the yellow boxes are used, and skill switching is controlled via joystick input.
Figure 3: Details of Premium Trajectory Selector.
Figure 4: Heatmap of ablation and comparison experiment results. The vertical axis "C" denotes the cosine similarity between each skill trajectory and its corresponding keyframe, with higher values indicating greater similarity. "S" and "D" represent the success rates of traversing steps and discrete footholds, respectively, where higher values indicate more robust policies.
Figure 5: Comparative results. (a) KiRAS w/o Skill Initialization learns 5 skills (walk, crawl, stilt, pitch-up, pitch-down) with little distinction. (b) DreamWaQ’s crawl and pitch-down use knee-ground contact (red circles). (c) KiRAS w/o MuC fails to lift legs over obstacles.
...and 4 more figures

KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots

Abstract

KiRAS: Keyframe Guided Self-Imitation for Robust and Adaptive Skill Learning in Quadruped Robots

Authors

Abstract

Table of Contents

Figures (9)