Learning Emergent Gaits with Decentralized Phase Oscillators: on the role of Observations, Rewards, and Feedback
Jenny Zhang, Steve Heim, Se Hwan Jeon, Sangbae Kim
TL;DR
The paper introduces a minimal quadruped locomotion framework based on four decentralized phase oscillators per leg, each receiving local ground reaction force feedback as an observer gain to estimate stance/swing state. By incorporating phase observations and phase-based gait rewards, the approach enables emergent gait preferences without prescribing a fixed gait, with coupling between oscillator dynamics and GRF further accelerating convergence and enabling adaptation to perturbations. Comprehensive ablations show that combining all three signals yields balanced leg use, robust disturbance rejection, and faster gait emergence, while rewards can strongly influence stability even when observations are non-Markovian. The method offers a scalable route toward gait emergence with potential benefits for sim-to-real transfer and hierarchical RL, where the phase oscillators serve as a latent cyclic state for temporal abstraction and coordination.
Abstract
We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.
