Table of Contents
Fetching ...

Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation

Run Su, Hao Fu, Shuai Zhou, Yingao Fu

TL;DR

The paper tackles the problem of safe social robot navigation under pedestrian uncertainty by framing offline-to-online reinforcement learning as a distribution-shift challenge. It introduces OTOFRL, which combines a Return-to-Go predictor based on a spatio-temporal fusion transformer with a causal transformer policy, embedding RTG tokens into action prediction to align offline data with online dynamics. A hybrid offline-online replay buffer and a dual-timescale update scheme, along with prioritized sampling, stabilize fine-tuning and mitigate off-policy errors, enabling safer exploration. Experiments in simulation and real-world tests demonstrate improved success rates, reduced collisions, shorter navigation times, and higher sampling efficiency, indicating strong sim-to-real transfer and practical impact for robust, adaptive robot navigation in crowds.

Abstract

Offline reinforcement learning (RL) has emerged as a promising framework for addressing robot social navigation challenges. However, inherent uncertainties in pedestrian behavior and limited environmental interaction during training often lead to suboptimal exploration and distributional shifts between offline training and online deployment. To overcome these limitations, this paper proposes a novel offline-to-online fine-tuning RL algorithm for robot social navigation by integrating Return-to-Go (RTG) prediction into a causal Transformer architecture. Our algorithm features a spatiotem-poral fusion model designed to precisely estimate RTG values in real-time by jointly encoding temporal pedestrian motion patterns and spatial crowd dynamics. This RTG prediction framework mitigates distribution shift by aligning offline policy training with online environmental interactions. Furthermore, a hybrid offline-online experience sampling mechanism is built to stabilize policy updates during fine-tuning, ensuring balanced integration of pre-trained knowledge and real-time adaptation. Extensive experiments in simulated social navigation environments demonstrate that our method achieves a higher success rate and lower collision rate compared to state-of-the-art baselines. These results underscore the efficacy of our algorithm in enhancing navigation policy robustness and adaptability. This work paves the way for more reliable and adaptive robotic navigation systems in real-world applications.

Integrating Offline Pre-Training with Online Fine-Tuning: A Reinforcement Learning Approach for Robot Social Navigation

TL;DR

The paper tackles the problem of safe social robot navigation under pedestrian uncertainty by framing offline-to-online reinforcement learning as a distribution-shift challenge. It introduces OTOFRL, which combines a Return-to-Go predictor based on a spatio-temporal fusion transformer with a causal transformer policy, embedding RTG tokens into action prediction to align offline data with online dynamics. A hybrid offline-online replay buffer and a dual-timescale update scheme, along with prioritized sampling, stabilize fine-tuning and mitigate off-policy errors, enabling safer exploration. Experiments in simulation and real-world tests demonstrate improved success rates, reduced collisions, shorter navigation times, and higher sampling efficiency, indicating strong sim-to-real transfer and practical impact for robust, adaptive robot navigation in crowds.

Abstract

Offline reinforcement learning (RL) has emerged as a promising framework for addressing robot social navigation challenges. However, inherent uncertainties in pedestrian behavior and limited environmental interaction during training often lead to suboptimal exploration and distributional shifts between offline training and online deployment. To overcome these limitations, this paper proposes a novel offline-to-online fine-tuning RL algorithm for robot social navigation by integrating Return-to-Go (RTG) prediction into a causal Transformer architecture. Our algorithm features a spatiotem-poral fusion model designed to precisely estimate RTG values in real-time by jointly encoding temporal pedestrian motion patterns and spatial crowd dynamics. This RTG prediction framework mitigates distribution shift by aligning offline policy training with online environmental interactions. Furthermore, a hybrid offline-online experience sampling mechanism is built to stabilize policy updates during fine-tuning, ensuring balanced integration of pre-trained knowledge and real-time adaptation. Extensive experiments in simulated social navigation environments demonstrate that our method achieves a higher success rate and lower collision rate compared to state-of-the-art baselines. These results underscore the efficacy of our algorithm in enhancing navigation policy robustness and adaptability. This work paves the way for more reliable and adaptive robotic navigation systems in real-world applications.

Paper Structure

This paper contains 15 sections, 18 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: The OTOFRL architecture employs online fine-tuning to adapt offline DT and RTGP models through knowledge transfer from offline learning. A spatio-temporal fusion transformer predicts the RTG, which is then used as a token in the online DT. Both the online DT and RTGP models are subsequently updated using a hybrid offline-online sampling mechanism.
  • Figure 2: Robot trajectory comparison of the different methods in identical social formation navigation test scenarios.
  • Figure 3: Testing of robots in real situations
  • Figure 4: The real-world radar map