SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

Anlun Huang; Zhenyu Wu; Soofiyan Atar; Yuheng Zhi; Michael Yip

SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

Anlun Huang, Zhenyu Wu, Soofiyan Atar, Yuheng Zhi, Michael Yip

TL;DR

ReST-RL is introduced, a hierarchical reinforcement learning architecture that explicitly decouples locomotion from payload stabilization, evaluated via the SteadyTray benchmark and demonstrates highly reliable zero-shot sim-to-real generalization across various objects and external force disturbances.

Abstract

Stabilizing unsecured payloads against the inherent oscillations of dynamic bipedal locomotion remains a critical engineering bottleneck for humanoids in unstructured environments. To solve this, we introduce ReST-RL, a hierarchical reinforcement learning architecture that explicitly decouples locomotion from payload stabilization, evaluated via the SteadyTray benchmark. Rather than relying on monolithic end-to-end learning, our framework integrates a robust base locomotion policy with a dynamic residual module engineered to actively cancel gait-induced perturbations at the end-effector. This architectural separation ensures steady tray transport without degrading the underlying bipedal stability. In simulation, the residual design significantly outperforms end-to-end baselines in gait smoothness and orientation accuracy, achieving a 96.9% success rate in variable velocity tracking and 74.5% robustness against external force disturbances. Successfully deployed on the Unitree G1 humanoid hardware, this modular approach demonstrates highly reliable zero-shot sim-to-real generalization across various objects and external force disturbances.

SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

TL;DR

Abstract

Paper Structure (28 sections, 4 equations, 9 figures, 7 tables)

This paper contains 28 sections, 4 equations, 9 figures, 7 tables.

INTRODUCTION
Related Work
Humanoid Loco-Manipulation
Humanoid Policy Architectures
Stability Control in Mobile Manipulation
METHODOLOGY
Problem Statement
Base Policy Training
Residual Module Learning
Residual Action Adapter
Residual FiLM Adapter
Residual Module Distillation
Domain Randomization and Environment
Reward Design
EXPERIMENTS
...and 13 more sections

Figures (9)

Figure 1: ReST-RL enables a Unitree G1 humanoid to perform the SteadyTray task in a real-world setting, with a fluid-filled wine glass as one of the payload. The robot keeps the tray level to prevent fluid sloshing, glass tipping, and payload falling during transport.
Figure 2: Overview of the ReST-RL framework. Base Policy Training: A locomotion policy is first trained to carry a tray while maintaining a stable gait. Residual Module Training: using privileged observations, a residual module learns whole-body corrective adjustments on top of the frozen base policy to stabilize the payload under disturbances. Two residual designs are considered: (a) Residual Action Adapter, which adds corrective residual actions to the base action, and (b) Residual FiLM Adapter, which modulates intermediate activations of the frozen base policy via layer-wise FiLM residuals. The student encoder distillation process is shown in Fig. 3.
Figure 3: Residual module distillation. The teacher encoder uses privileged observations, whereas the student encoder uses object-centric inputs; both feed into a frozen residual adapter for latent alignment.
Figure 4: Training reward comparison between End2End and ReST-RL.
Figure 5: Success rate of ReST-RL trained with and without observation delay under increasing perception latency in Push Robot task.
...and 4 more figures

SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

TL;DR

Abstract

SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)