SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

Milo Carroll; Tianhu Peng; Lingfan Bao; Chengxu Zhou; Zhibin Li

SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

Milo Carroll, Tianhu Peng, Lingfan Bao, Chengxu Zhou, Zhibin Li

TL;DR

Sensor-Conditioned Diffusion Policies (SCDP) is presented, that enables humanoid locomotion using only onboard sensors, eliminating the need for explicit state estimation.

Abstract

Distilling humanoid locomotion control from offline datasets into deployable policies remains a challenge, as existing methods rely on privileged full-body states that require complex and often unreliable state estimation. We present Sensor-Conditioned Diffusion Policies (SCDP) that enables humanoid locomotion using only onboard sensors, eliminating the need for explicit state estimation. SCDP decouples sensing from supervision through mixed-observation training: diffusion model conditions on sensor histories while being supervised to predict privileged future state-action trajectories, enforcing the model to infer the motion dynamics under partial observability. We further develop restricted denoising, context distribution alignment, and context-aware attention masking to encourage implicit state estimation within the model and to prevent train-deploy mismatch. We validate SCDP on velocity-commanded locomotion and motion reference tracking tasks. In simulation, SCDP achieves near-perfect success on velocity control (99-100%) and 93% tracking success in AMASS test set, performing comparable to privileged baselines while using only onboard sensors. Finally, we deploy the trained policy on a real G1 humanoid at 50 Hz, demonstrating robust real robot locomotion without external sensing or state estimation.

SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

TL;DR

Sensor-Conditioned Diffusion Policies (SCDP) is presented, that enables humanoid locomotion using only onboard sensors, eliminating the need for explicit state estimation.

Abstract

Paper Structure (22 sections, 6 equations, 8 figures, 5 tables)

This paper contains 22 sections, 6 equations, 8 figures, 5 tables.

INTRODUCTION
Method
Multi-Motion Tracking Policy
Mixed-Observation Distillation: Learning Global Dynamics from Partial Sensing
Problem Formulation
Diffusion Model Formulation
Restricted Denoising
Context Distribution Alignment
Context-Aware Attention Masking
Representation Alignment
Velocity Conditioning
Motion Reference Conditioning
Datasets
Training and Deployment Configuration
Simulation Results
...and 7 more sections

Figures (8)

Figure 1: Deployment of Sensor-Conditioned Diffusion Policies (SCDP) on Unitree G1, performing robust locomotion at 50 Hz using only proprioceptive sensors, without external motion capture or state estimation.
Figure 2: Sensor-Conditioned Diffusion Policies (SCDP) architecture and training framework. The state-action diffusion policy conditions on sensor observation history $\{o_t\}$, past actions $\{a_t\}$, and commands $c_t$, while predicting future trajectories containing privileged states $\{s_t\}$ and actions. This mixed-observation formulation enables learning global dynamics from partial sensing.
Figure 3: Waypoint navigation task. Sequential frames showing the SCDP controlling the robot to navigate succesfully to all five waypoints (green: current target, red: remaining targets).
Figure 4: Fast forward walking behaviour. Sequential frames showing the policy executing forward locomotion.
Figure 6: Velocity tracking performance. Comparison of realized velocities against commanded targets (black) for (a) linear velocity $v_x$ and (b) yaw rate $\dot{\psi}$. SCDP (orange) exhibits smoother tracking with reduced oscillations compared to the BeyondMimic baseline (teal), with slight lag during direction changes.
...and 3 more figures

SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

TL;DR

Abstract

SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)