Table of Contents
Fetching ...

Learning Quadruped Locomotion Policies using Logical Rules

David DeFazio, Yohei Hayamizu, Shiqi Zhang

TL;DR

This work introduces RM-based Locomotion Learning (RMLL), which uses Reward Machines to specify quadruped gaits through simple foot-contact propositions, enabling easy gait specification, efficient policy learning, and runtime gait-frequency control without relying on motion priors. By incorporating the RM state into the RL state, the approach captures gait-level history and achieves improved sample efficiency across six gaits, including novel ones like Three-One and Half-Bound. Extensive simulation demonstrates energy and stability trade-offs across terrains, and real-world experiments on a Unitree A1 confirm sim-to-real transfer and dynamic gait-frequency adjustment. The framework offers a flexible, scalable path toward rapid gait customization and deployment in legged robots, with future work on gait transitions, additional parameters, and language-driven gait extraction.

Abstract

Quadruped animals are capable of exhibiting a diverse range of locomotion gaits. While progress has been made in demonstrating such gaits on robots, current methods rely on motion priors, dynamics models, or other forms of extensive manual efforts. People can use natural language to describe dance moves. Could one use a formal language to specify quadruped gaits? To this end, we aim to enable easy gait specification and efficient policy learning. Leveraging Reward Machines~(RMs) for high-level gait specification over foot contacts, our approach is called RM-based Locomotion Learning~(RMLL), and supports adjusting gait frequency at execution time. Gait specification is enabled through the use of a few logical rules per gait (e.g., alternate between moving front feet and back feet) and does not require labor-intensive motion priors. Experimental results in simulation highlight the diversity of learned gaits (including two novel gaits), their energy consumption and stability across different terrains, and the superior sample-efficiency when compared to baselines. We also demonstrate these learned policies with a real quadruped robot. Video and supplementary materials: https://sites.google.com/view/rm-locomotion-learning/home

Learning Quadruped Locomotion Policies using Logical Rules

TL;DR

This work introduces RM-based Locomotion Learning (RMLL), which uses Reward Machines to specify quadruped gaits through simple foot-contact propositions, enabling easy gait specification, efficient policy learning, and runtime gait-frequency control without relying on motion priors. By incorporating the RM state into the RL state, the approach captures gait-level history and achieves improved sample efficiency across six gaits, including novel ones like Three-One and Half-Bound. Extensive simulation demonstrates energy and stability trade-offs across terrains, and real-world experiments on a Unitree A1 confirm sim-to-real transfer and dynamic gait-frequency adjustment. The framework offers a flexible, scalable path toward rapid gait customization and deployment in legged robots, with future work on gait transitions, additional parameters, and language-driven gait extraction.

Abstract

Quadruped animals are capable of exhibiting a diverse range of locomotion gaits. While progress has been made in demonstrating such gaits on robots, current methods rely on motion priors, dynamics models, or other forms of extensive manual efforts. People can use natural language to describe dance moves. Could one use a formal language to specify quadruped gaits? To this end, we aim to enable easy gait specification and efficient policy learning. Leveraging Reward Machines~(RMs) for high-level gait specification over foot contacts, our approach is called RM-based Locomotion Learning~(RMLL), and supports adjusting gait frequency at execution time. Gait specification is enabled through the use of a few logical rules per gait (e.g., alternate between moving front feet and back feet) and does not require labor-intensive motion priors. Experimental results in simulation highlight the diversity of learned gaits (including two novel gaits), their energy consumption and stability across different terrains, and the superior sample-efficiency when compared to baselines. We also demonstrate these learned policies with a real quadruped robot. Video and supplementary materials: https://sites.google.com/view/rm-locomotion-learning/home

Paper Structure

This paper contains 30 sections, 1 equation, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Snapshots of important poses of each of the six gaits learned with six different RMs. Specifying and learning the gaits require defining an automaton with no more than five automaton states (only two for half of the gaits). Red circles are around feet making contact with the ground.
  • Figure 2: Overview of RM-based Locomotion Learning (RMLL). We consider propositional statements specifying foot contacts. We then construct an automaton via propositional logic formulas for each locomotion gait (left side). To train gait-specific locomotion policies, we use observations which contain information from the RM, proprioception, velocity and gait frequency commands, and variables from a state estimator (right side).
  • Figure 3: Reward Machine for Trot gait, where we want to synchronize lifting the FL leg with the BR leg, and the FR leg with the BL leg. Trot is one of the six gaits considered in this work.
  • Figure 4: Isaac Gym simulation environment.
  • Figure 5: Reward curves for all gaits. RMLL more efficiently accumulates reward for each gait, particularly for the gaits with more complex foot contact sequences Walk, Three-One, and Half-Bound.
  • ...and 7 more figures