Table of Contents
Fetching ...

Hey Robot! Personalizing Robot Navigation through Model Predictive Control with a Large Language Model

Diego Martinez-Baselga, Oscar de Groot, Luzia Knoedler, Javier Alonso-Mora, Luis Riazuelo, Luis Montano

TL;DR

This work tackles the problem of end-user customization of robot navigation in dynamic, human-centered environments. It introduces Hey Robot!, a zero-shot, LLM-enabled architecture that interprets natural language queries (and optionally camera input) to generate and continuously reconfigure the MPC cost function that governs robot motion, while preserving safety via collision constraints. The approach comprises four specialized assistants (Capability, Cost Generation, Camera, Weight Retrieval) that collaborate to produce a suitable $J_{oldsymbol{q}_j}$ and reparameterize the controller in real time, with CasADi/Acados powering the MPC and topology-inspired global search to avoid local minima. Extensive simulations and real-robot experiments demonstrate that user-specified tasks (e.g., following a path, staying distant from humans) are realized with appropriate trade-offs between speed, smoothness, and safety, indicating practical potential for adaptable, user-centric robotic navigation.

Abstract

Robot navigation methods allow mobile robots to operate in applications such as warehouses or hospitals. While the environment in which the robot operates imposes requirements on its navigation behavior, most existing methods do not allow the end-user to configure the robot's behavior and priorities, possibly leading to undesirable behavior (e.g., fast driving in a hospital). We propose a novel approach to adapt robot motion behavior based on natural language instructions provided by the end-user. Our zero-shot method uses an existing Visual Language Model to interpret a user text query or an image of the environment. This information is used to generate the cost function and reconfigure the parameters of a Model Predictive Controller, translating the user's instruction to the robot's motion behavior. This allows our method to safely and effectively navigate in dynamic and challenging environments. We extensively evaluate our method's individual components and demonstrate the effectiveness of our method on a ground robot in simulation and real-world experiments, and across a variety of environments and user specifications.

Hey Robot! Personalizing Robot Navigation through Model Predictive Control with a Large Language Model

TL;DR

This work tackles the problem of end-user customization of robot navigation in dynamic, human-centered environments. It introduces Hey Robot!, a zero-shot, LLM-enabled architecture that interprets natural language queries (and optionally camera input) to generate and continuously reconfigure the MPC cost function that governs robot motion, while preserving safety via collision constraints. The approach comprises four specialized assistants (Capability, Cost Generation, Camera, Weight Retrieval) that collaborate to produce a suitable and reparameterize the controller in real time, with CasADi/Acados powering the MPC and topology-inspired global search to avoid local minima. Extensive simulations and real-robot experiments demonstrate that user-specified tasks (e.g., following a path, staying distant from humans) are realized with appropriate trade-offs between speed, smoothness, and safety, indicating practical potential for adaptable, user-centric robotic navigation.

Abstract

Robot navigation methods allow mobile robots to operate in applications such as warehouses or hospitals. While the environment in which the robot operates imposes requirements on its navigation behavior, most existing methods do not allow the end-user to configure the robot's behavior and priorities, possibly leading to undesirable behavior (e.g., fast driving in a hospital). We propose a novel approach to adapt robot motion behavior based on natural language instructions provided by the end-user. Our zero-shot method uses an existing Visual Language Model to interpret a user text query or an image of the environment. This information is used to generate the cost function and reconfigure the parameters of a Model Predictive Controller, translating the user's instruction to the robot's motion behavior. This allows our method to safely and effectively navigate in dynamic and challenging environments. We extensively evaluate our method's individual components and demonstrate the effectiveness of our method on a ground robot in simulation and real-world experiments, and across a variety of environments and user specifications.
Paper Structure (22 sections, 3 equations, 6 figures, 5 tables)

This paper contains 22 sections, 3 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: A user asks the robot for a motion behavior, which is fulfilled in real-time.
  • Figure 2: Flow diagram of our proposed LLM module and its connection to the MPC. Inputs are in blue and outputs in red. Dashed lines represent updates regarding the state for the following query.
  • Figure 3: Number of times (rate) each response was selected for the Capability Assistant experiment.
  • Figure 4: Images of the simulator taken with the robot camera.
  • Figure 5: Five overlapping experiments with trajectories of the robot (blue) and human (green) and a reference path (dashed).
  • ...and 1 more figures