SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation

Noriaki Hirose; Dhruv Shah; Kyle Stachowicz; Ajay Sridhar; Sergey Levine

SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation

Noriaki Hirose, Dhruv Shah, Kyle Stachowicz, Ajay Sridhar, Sergey Levine

TL;DR

SELFI addresses the challenge of real-world robotic adaptation by fine-tuning a pre-trained model-based policy with online model-free RL. It introduces a hybrid value function $Q(s,\tau)=J(s,\tau)+\bar{Q}(s,a)$ that combines a model-based trajectory value with a learned residual, and optimizes this with a TD3-based framework while sharing representations. Real-world evaluations in vision-based social navigation show SELFI achieves faster learning, improved collision avoidance, and enhanced social comfort, with reduced human interventions compared to offline-only and other online-finetuning baselines. The approach enables rapid, data-efficient online improvement of robotic policies using existing model-based priors, with broad potential for safer, more autonomous adaptation in dynamic environments.

Abstract

Autonomous self-improving robots that interact and improve with experience are key to the real-world deployment of robotic systems. In this paper, we propose an online learning method, SELFI, that leverages online robot experience to rapidly fine-tune pre-trained control policies efficiently. SELFI applies online model-free reinforcement learning on top of offline model-based learning to bring out the best parts of both learning paradigms. Specifically, SELFI stabilizes the online learning process by incorporating the same model-based learning objective from offline pre-training into the Q-values learned with online model-free reinforcement learning. We evaluate SELFI in multiple real-world environments and report improvements in terms of collision avoidance, as well as more socially compliant behavior, measured by a human user study. SELFI enables us to quickly learn useful robotic behaviors with less human interventions such as pre-emptive behavior for the pedestrians, collision avoidance for small and transparent objects, and avoiding travel on uneven floor surfaces. We provide supplementary videos to demonstrate the performance of our fine-tuned policy on our project page.

SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation

TL;DR

SELFI addresses the challenge of real-world robotic adaptation by fine-tuning a pre-trained model-based policy with online model-free RL. It introduces a hybrid value function

that combines a model-based trajectory value with a learned residual, and optimizes this with a TD3-based framework while sharing representations. Real-world evaluations in vision-based social navigation show SELFI achieves faster learning, improved collision avoidance, and enhanced social comfort, with reduced human interventions compared to offline-only and other online-finetuning baselines. The approach enables rapid, data-efficient online improvement of robotic policies using existing model-based priors, with broad potential for safer, more autonomous adaptation in dynamic environments.

Abstract

Paper Structure (28 sections, 16 equations, 17 figures, 3 tables)

This paper contains 28 sections, 16 equations, 17 figures, 3 tables.

Introduction
Related Work
Online learning
Navigation
Combining Model-based Control with Online Model-Free RL
Preliminaries
SELFI learning architecture
SELFI implementation
SELFI System Setup
Offline Learning with SACSoN
Online Learning with SELFI
Robotic system
Evaluation
Evaluation setup
Performance analysis of the fine-tuned control policy
...and 13 more sections

Figures (17)

Figure 1: Overview of our proposed online learning system, SELFI. Our method fine-tunes a pre-trained control policy trained with model-based objective by incorporating this objective into a Q-value function to maximize during online model-free RL.
Figure 2: System diagram of SELFI. SELFI, implemented in the workstation, trains the actors by maximizing the proposed hybrid objectives and sends the actor parameters to the robot controller. The robot actuates using the trained actors and sends new data to the workstation.
Figure 3: SELFI architecture overview. Before online learning, we train the encoder and the actor by maximizing the differentiable model-based objective. In the online phase, we combine the offline objective with the learned $Q$-value from model-free RL to fine-tune the actor.
Figure 4: Overview of the prototype robot niwa2022spatio. We use the Ricoh Theta S omnidirectional camera and run inference on a Nvidia Orin AGX. Yellow description boxes are components used in online learning only.
Figure 5: Three environments on online training and evaluation. We conduct online training in three different challenging environments, [a] Environment 1 is the open space facing restrooms, elevator hall and café space, [b] Environment 2 is in the entrance hall with many pedestrians, and, [c] Environment 3 is along the office area with narrow corridors. Environment 1 and 3 have many glass walls, which are difficult for collision avoidance and cause lighting condition changes.
...and 12 more figures

SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation

TL;DR

Abstract

SELFI: Autonomous Self-Improvement with Reinforcement Learning for Social Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (17)