SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation
Noriaki Hirose, Dhruv Shah, Kyle Stachowicz, Ajay Sridhar, Sergey Levine
TL;DR
SELFI addresses the challenge of real-world robotic adaptation by fine-tuning a pre-trained model-based policy with online model-free RL. It introduces a hybrid value function $Q(s,\tau)=J(s,\tau)+\bar{Q}(s,a)$ that combines a model-based trajectory value with a learned residual, and optimizes this with a TD3-based framework while sharing representations. Real-world evaluations in vision-based social navigation show SELFI achieves faster learning, improved collision avoidance, and enhanced social comfort, with reduced human interventions compared to offline-only and other online-finetuning baselines. The approach enables rapid, data-efficient online improvement of robotic policies using existing model-based priors, with broad potential for safer, more autonomous adaptation in dynamic environments.
Abstract
Autonomous self-improving robots that interact and improve with experience are key to the real-world deployment of robotic systems. In this paper, we propose an online learning method, SELFI, that leverages online robot experience to rapidly fine-tune pre-trained control policies efficiently. SELFI applies online model-free reinforcement learning on top of offline model-based learning to bring out the best parts of both learning paradigms. Specifically, SELFI stabilizes the online learning process by incorporating the same model-based learning objective from offline pre-training into the Q-values learned with online model-free reinforcement learning. We evaluate SELFI in multiple real-world environments and report improvements in terms of collision avoidance, as well as more socially compliant behavior, measured by a human user study. SELFI enables us to quickly learn useful robotic behaviors with less human interventions such as pre-emptive behavior for the pedestrians, collision avoidance for small and transparent objects, and avoiding travel on uneven floor surfaces. We provide supplementary videos to demonstrate the performance of our fine-tuned policy on our project page.
