Residual-MPPI: Online Policy Customization for Continuous Control
Pengcheng Wang, Chenran Li, Catherine Weaver, Kenta Kawamoto, Masayoshi Tomizuka, Chen Tang, Wei Zhan
TL;DR
Residual-MPPI tackles the challenge of adapting pretrained continuous-control policies at execution time without retraining. By integrating a log-likelihood term of the prior policy into MPPI’s online planning, it enables online customization for new objectives via zero-shot or few-shot data, using only the prior action distribution and an add-on reward. The approach is validated on MuJoCo benchmarks and a high-fidelity autonomous racing task with GT Sophy in GT Sport, showing strong gains in aligning behavior with add-on objectives while preserving basic task performance and achieving data-efficient dynamics refinement. This work advances practical policy deployment by enabling flexible, data-efficient, and real-time policy customization in continuous-control settings with limited access to original training data or rewards.
Abstract
Policies developed through Reinforcement Learning (RL) and Imitation Learning (IL) have shown great potential in continuous control tasks, but real-world applications often require adapting trained policies to unforeseen requirements. While fine-tuning can address such needs, it typically requires additional data and access to the original training metrics and parameters. In contrast, an online planning algorithm, if capable of meeting the additional requirements, can eliminate the necessity for extensive training phases and customize the policy without knowledge of the original training scheme or task. In this work, we propose a generic online planning algorithm for customizing continuous-control policies at the execution time, which we call Residual-MPPI. It can customize a given prior policy on new performance metrics in few-shot and even zero-shot online settings, given access to the prior action distribution alone. Through our experiments, we demonstrate that the proposed Residual-MPPI algorithm can accomplish the few-shot/zero-shot online policy customization task effectively, including customizing the champion-level racing agent, Gran Turismo Sophy (GT Sophy) 1.0, in the challenging car racing scenario, Gran Turismo Sport (GTS) environment. Code for MuJoCo experiments is included in the supplementary and will be open-sourced upon acceptance. Demo videos and code are available on our website: https://sites.google.com/view/residual-mppi.
