ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking

Weidong Huang; Jingwen Zhang; Jiongye Li; Shibowen Zhang; Jiayang Wu; Jiayi Wang; Hangxin Liu; Yaodong Yang; Yao Su

ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking

Weidong Huang, Jingwen Zhang, Jiongye Li, Shibowen Zhang, Jiayang Wu, Jiayi Wang, Hangxin Liu, Yaodong Yang, Yao Su

TL;DR

The paper addresses energy efficiency in humanoid locomotion by reframing energy usage as explicit constraints within a constrained RL framework. ECO employs a PPO-Lagrangian approach to enforce two key constraints—energy consumption and reference motion symmetry—while optimizing task-related rewards, enabling principled hyperparameter tuning and stable learning. Across extensive simulation and real-world BRUCE experiments, ECO achieves substantial energy reductions (approximately 6x vs MPC and 2.3x vs PPO) without sacrificing walking robustness, and demonstrates strong sim-to-real transfer and emergent energy-efficient gait patterns. This constrained-RL formulation offers a practical path toward sustainable, reliable humanoid locomotion in real-world environments, with potential extensions to multi-modal terrains and higher-fidelity models.

Abstract

Achieving stable and energy-efficient locomotion is essential for humanoid robots to operate continuously in real-world applications. Existing MPC and RL approaches often rely on energy-related metrics embedded within a multi-objective optimization framework, which require extensive hyperparameter tuning and often result in suboptimal policies. To address these challenges, we propose ECO (Energy-Constrained Optimization), a constrained RL framework that separates energy-related metrics from rewards, reformulating them as explicit inequality constraints. This method provides a clear and interpretable physical representation of energy costs, enabling more efficient and intuitive hyperparameter tuning for improved energy efficiency. ECO introduces dedicated constraints for energy consumption and reference motion, enforced by the Lagrangian method, to achieve stable, symmetric, and energy-efficient walking for humanoid robots. We evaluated ECO against MPC, standard RL with reward shaping, and four state-of-the-art constrained RL methods. Experiments, including sim-to-sim and sim-to-real transfers on the kid-sized humanoid robot BRUCE, demonstrate that ECO significantly reduces energy consumption compared to baselines while maintaining robust walking performance. These results highlight a substantial advancement in energy-efficient humanoid locomotion. All experimental demonstrations can be found on the project website: https://sites.google.com/view/eco-humanoid.

ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking

TL;DR

Abstract

Paper Structure (44 sections, 26 equations, 12 figures, 5 tables)

This paper contains 44 sections, 26 equations, 12 figures, 5 tables.

Introduction
Related Work
Energy Optimization in Legged Locomotion
Learning for Legged Locomotion
Constrained Reinforcement Learning
PRELIMINARIES & PROBLEM STATEMENT
Markov Decision Processes
Constrained Markov Decision Processes
Constraints RL Methods
PPO-Lag
CRPO
IPO
P3O
Proposed ECO Framework
State Space
...and 29 more sections

Figures (12)

Figure 1: Comparison between the proposed constrained RL framework, ECO, with MPC and normal RL (PPO) baselines. It creates synergy between energy consumption and walking stability of the humanoid robots without requiring an extensive parameter-turning process, outperforming both MPC and normal RL baselines.
Figure 2: Overview of the training and deployment process in proposed ECO framework. The policy network, taking velocity commands and proprioception data as inputs, outputs desired joint positions at $100Hz$ to a PD controller, which updates torque commands at $1kHz$. The reward critic is trained with privileged observations. The simulator provides the reward, energy cost, and symmetry cost, which are used to compute the reward and cost returns. The policy is then updated using the Lagrangian formulation in \ref{['eq:lagobject']} to balance rewards and costs. The trained policy is directly deployed to the real world.
Figure 3: Comparison of training metrics for ECO, P3O, IPO, and CRPO. The energy consumption and mirror reference motion thresholds are set at $60J$ and 0.05, respectively, as indicated by the black dashed lines in (a) and (b). Results averaged over 10 random seeds.
Figure 4: Comparison of training metrics for ECO and the PPO. The energy consumption and mirror reference motion thresholds are set at $60J$ and 0.05, respectively, as indicated by the black dashed lines in (a) and (b). Results averaged over 10 random seeds.
Figure 5: Sim-to-Sim transfer results. For visual clarity, single-run deployment curves are shown. (a) Visual comparison of walking stability across simulators; (b) Ankle height consistency across simulators; (c) Motor energy consumption at $0.1m/s$ over $10s$ in Gazebo; (d) Body velocity tracking in MuJoCo in different speed commands.
...and 7 more figures

ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking

TL;DR

Abstract

ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking

Authors

TL;DR

Abstract

Table of Contents

Figures (12)