Demonstration-Enhanced Adaptable Multi-Objective Robot Navigation
Jorge de Heuvel, Tharun Sethuraman, Maren Bennewitz
TL;DR
The paper tackles the challenge of aligning robot navigation with evolving human preferences by integrating demonstration-driven learning into a multi-objective reinforcement learning framework. It proposes a TD3-based PD-MORL policy augmented with a D-REX–derived reward model to reflect demonstrations and enables on-the-fly preference-driven adaptation via an input strength vector $\boldsymbol{\lambda}$ without retraining. The approach yields a four-objective reward vector, balancing core navigation (goal progress and collision avoidance) with tuneable objectives for demonstration-like behavior, proxemics, and efficiency, and demonstrates strong preference reflection, robustness, and sim-to-real transfer on two robots. Real-world experiments corroborate the method’s practical viability, showing safe, adaptable navigation across static and dynamic human scenarios. Overall, the framework offers a principled, post-training mechanism to personalize robot navigation in human-centric environments with real-time control over objective trade-offs.
Abstract
Preference-aligned robot navigation in human environments is typically achieved through learning-based approaches, utilizing user feedback or demonstrations for personalization. However, personal preferences are subject to change and might even be context-dependent. Yet traditional reinforcement learning (RL) approaches with static reward functions often fall short in adapting to evolving user preferences, inevitably reflecting demonstrations once training is completed. This paper introduces a structured framework that combines demonstration-based learning with multi-objective reinforcement learning (MORL). To ensure real-world applicability, our approach allows for dynamic adaptation of the robot navigation policy to changing user preferences without retraining. It fluently modulates the amount of demonstration data reflection and other preference-related objectives. Through rigorous evaluations, including a baseline comparison and sim-to-real transfer on two robots, we demonstrate our framework's capability to adapt to user preferences accurately while achieving high navigational performance in terms of collision avoidance and goal pursuance.
