Table of Contents
Fetching ...

Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation

Tharun Sethuraman, Subham Agrawal, Nils Dengler, Jorge de Heuvel, Teena Hassan, Maren Bennewitz

Abstract

Robots operating in human-shared environments must not only achieve task-level navigation objectives such as safety and efficiency, but also adapt their behavior to human preferences. However, as human preferences are typically expressed in natural language and depend on environmental context, it is difficult to directly integrate them into low-level robot control policies. In this work, we present a pipeline that enables robots to understand and apply context-dependent navigation preferences by combining foundational models with a Multi-Objective Reinforcement Learning (MORL) navigation policy. Thus, our approach integrates high-level semantic reasoning with low-level motion control. A Vision-Language Model (VLM) extracts structured environmental context from onboard visual observations, while Large Language Models (LLM) convert natural language user feedback into interpretable, context-dependent behavioral rules stored in a persistent but updatable rule memory. A preference translation module then maps contextual information and stored rules into numerical preference vectors that parameterize a pretrained MORL policy for real-time navigation adaptation. We evaluate the proposed framework through quantitative component-level evaluations, a user study, and real-world robot deployments in various indoor environments. Our results demonstrate that the system reliably captures user intent, generates consistent preference vectors, and enables controllable behavior adaptation across diverse contexts. Overall, the proposed pipeline improves the adaptability, transparency, and usability of robots operating in shared human environments, while maintaining safe and responsive real-time control.

Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation

Abstract

Robots operating in human-shared environments must not only achieve task-level navigation objectives such as safety and efficiency, but also adapt their behavior to human preferences. However, as human preferences are typically expressed in natural language and depend on environmental context, it is difficult to directly integrate them into low-level robot control policies. In this work, we present a pipeline that enables robots to understand and apply context-dependent navigation preferences by combining foundational models with a Multi-Objective Reinforcement Learning (MORL) navigation policy. Thus, our approach integrates high-level semantic reasoning with low-level motion control. A Vision-Language Model (VLM) extracts structured environmental context from onboard visual observations, while Large Language Models (LLM) convert natural language user feedback into interpretable, context-dependent behavioral rules stored in a persistent but updatable rule memory. A preference translation module then maps contextual information and stored rules into numerical preference vectors that parameterize a pretrained MORL policy for real-time navigation adaptation. We evaluate the proposed framework through quantitative component-level evaluations, a user study, and real-world robot deployments in various indoor environments. Our results demonstrate that the system reliably captures user intent, generates consistent preference vectors, and enables controllable behavior adaptation across diverse contexts. Overall, the proposed pipeline improves the adaptability, transparency, and usability of robots operating in shared human environments, while maintaining safe and responsive real-time control.
Paper Structure (19 sections, 9 equations, 6 figures, 3 tables)

This paper contains 19 sections, 9 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Preference-conditioned navigation behavior in a human-shared supermarket environment. Given a natural language user instruction requesting increased personal clearance, the robot adapts its trajectory to maintain a larger distance from nearby humans while progressing toward the goal location. As can be seen from the baseline shortest-path trajectory (orange) and the preference-adapted trajectory (green), the expressed preference influences the navigation behavior.
  • Figure 2: Overview of the proposed preference-aware navigation architecture. The reasoning module integrates VLM-based context prediction, LLM-based rule updating, and preference translation to convert visual context and natural language user feedback into structured rules and preference vectors usable by a MORL agent for online low-level navigation control. The preference vectors condition the navigation policy, enabling the robot to adapt its behavior to context-dependent user preferences while providing interpretable explanations of its decisions.
  • Figure 3: Example output of the context predictor. Given an input RGB scene image and a structured prompt, the VLM-based context predictor extracts semantic and spatial environmental information, including room type, detected objects, approximate object distances, human presence, lighting conditions, and a natural language scene description. This structured context representation is used as grounding input for downstream rule generation and preference translation modules.
  • Figure 4: Example rule generation for the context-aware preference pipeline. Given an input scene image and a natural language user preference, the VLM-based context predictor extracts structured environmental context, which is combined with user input by the LLM-based rule updater to generate a context-dependent navigation rule and explanation while preserving baseline safety objectives.
  • Figure 5: Example output of the preference translator. Given the predicted scene context (living room, no humans present) and the stored rule set, the LLM-based preference translator selects relevant context-based rules and generates a four-dimensional MORL preference vector. The resulting vector modulates navigation efficiency, obstacle distance, human distance, and velocity, while also providing the applied rule and a natural language explanation to ensure transparency in preference-conditioned navigation behavior.
  • ...and 1 more figures