Effective Tuning Strategies for Generalist Robot Manipulation Policies
Wenbo Zhang, Yang Li, Yanyuan Qiao, Siyuan Huang, Jiajun Liu, Feras Dayoub, Xiao Ma, Lingqiao Liu
TL;DR
The paper addresses the persistent generalization gap in GMPs when facing unseen tasks and embodiments due to insufficient diverse action data. It conducts an extensive empirical study of fine-tuning strategies, examining factors such as the action space, policy head, supervision signal, and tunable parameters, evaluated across 2,500 rollouts for a single configuration. The authors identify key design choices that influence performance in low-data regimes and show that carefully selected fine-tuning strategies can substantially outperform state-of-the-art imitation learning methods. The results establish a new baseline and provide practical guidelines to extend GMPs with efficient fine-tuning, enhancing their applicability across diverse devices and tasks.
Abstract
Generalist robot manipulation policies (GMPs) have the potential to generalize across a wide range of tasks, devices, and environments. However, existing policies continue to struggle with out-of-distribution scenarios due to the inherent difficulty of collecting sufficient action data to cover extensively diverse domains. While fine-tuning offers a practical way to quickly adapt a GMPs to novel domains and tasks with limited samples, we observe that the performance of the resulting GMPs differs significantly with respect to the design choices of fine-tuning strategies. In this work, we first conduct an in-depth empirical study to investigate the effect of key factors in GMPs fine-tuning strategies, covering the action space, policy head, supervision signal and the choice of tunable parameters, where 2,500 rollouts are evaluated for a single configuration. We systematically discuss and summarize our findings and identify the key design choices, which we believe give a practical guideline for GMPs fine-tuning. We observe that in a low-data regime, with carefully chosen fine-tuning strategies, a GMPs significantly outperforms the state-of-the-art imitation learning algorithms. The results presented in this work establish a new baseline for future studies on fine-tuned GMPs, and provide a significant addition to the GMPs toolbox for the community.
