Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan; Tongzhou Mu; Stone Tao; Yunhao Fang; Mengke Zhang; Hao Su

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

Xiu Yuan, Tongzhou Mu, Stone Tao, Yunhao Fang, Mengke Zhang, Hao Su

TL;DR

This work tackles the challenge of improving offline-trained large policy models in robotics by online refinement through a model-agnostic residual policy. The Residual Policy is trained with SAC on top of a frozen base policy, using bounded actions and a controlled exploration schedule to ensure stable, sample-efficient learning. Across ManiSkill and Adroit, with base models BeT and Diffusion Policy, Policy Decorator yields near-perfect task performance while preserving the smooth motions characteristic of imitation learning, outperforming both fine-tuning and non-fine-tuning baselines. The approach demonstrates robustness across observation modalities and base-policy architectures, and highlights the importance of component design and hyperparameters in practical online refinement.

Abstract

Recent advancements in robot learning have used imitation learning with large models and extensive demonstrations to develop effective policies. However, these models are often limited by the quantity, quality, and diversity of demonstrations. This paper explores improving offline-trained imitation learning models through online interactions with the environment. We introduce Policy Decorator, which uses a model-agnostic residual policy to refine large imitation learning models during online interactions. By implementing controlled exploration strategies, Policy Decorator enables stable, sample-efficient online learning. Our evaluation spans eight tasks across two benchmarks-ManiSkill and Adroit-and involves two state-of-the-art imitation learning models (Behavior Transformer and Diffusion Policy). The results show Policy Decorator effectively improves the offline-trained policies and preserves the smooth motion of imitation learning models, avoiding the erratic behaviors of pure RL policies. See our project page (https://policydecorator.github.io) for videos.

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

TL;DR

Abstract

Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (30)