Table of Contents
Fetching ...

NARRATE: Versatile Language Architecture for Optimal Control in Robotics

Seif Ismail, Antonio Arbues, Ryan Cotterell, René Zurbrügg, Carmen Amo Alonso

TL;DR

NARRATE presents a modular framework that uses a pre-trained LLM to translate natural language goals into hard-constraint MPC formulations, enabling safe and flexible robotic control. By dividing the Language Module into a Task Planner and an Optimization Designer and pairing it with an MPC-based Trajectory Generator and impedance-based Trajectory Tracker, the method handles long-horizon, contact-rich tasks and transfers from simulation to real robots. Empirical results show superior performance and efficiency compared to state-of-the-art language-to-action baselines, with notable gains when constraints and human feedback are incorporated. The work demonstrates practical viability for natural-language interfaces in manipulation, while highlighting avenues for safety guarantees and improved perception-driven feedback.

Abstract

The impressive capabilities of Large Language Models (LLMs) have led to various efforts to enable robots to be controlled through natural language instructions, opening exciting possibilities for human-robot interaction The goal is for the motor-control task to be performed accurately, efficiently and safely while also enjoying the flexibility imparted by LLMs to specify and adjust the task through natural language. In this work, we demonstrate how a careful layering of an LLM in combination with a Model Predictive Control (MPC) formulation allows for accurate and flexible robotic control via natural language while taking into consideration safety constraints. In particular, we rely on the LLM to effectively frame constraints and objective functions as mathematical expressions, which are later used in the motor-control module via MPC. The transparency of the optimization formulation allows for interpretability of the task and enables adjustments through human feedback. We demonstrate the validity of our method through extensive experiments on long-horizon reasoning, contact-rich, and multi-object interaction tasks. Our evaluations show that NARRATE outperforms current existing methods on these benchmarks and effectively transfers to the real world on two different embodiments. Videos, Code and Prompts at narrate-mpc.github.io

NARRATE: Versatile Language Architecture for Optimal Control in Robotics

TL;DR

NARRATE presents a modular framework that uses a pre-trained LLM to translate natural language goals into hard-constraint MPC formulations, enabling safe and flexible robotic control. By dividing the Language Module into a Task Planner and an Optimization Designer and pairing it with an MPC-based Trajectory Generator and impedance-based Trajectory Tracker, the method handles long-horizon, contact-rich tasks and transfers from simulation to real robots. Empirical results show superior performance and efficiency compared to state-of-the-art language-to-action baselines, with notable gains when constraints and human feedback are incorporated. The work demonstrates practical viability for natural-language interfaces in manipulation, while highlighting avenues for safety guarantees and improved perception-driven feedback.

Abstract

The impressive capabilities of Large Language Models (LLMs) have led to various efforts to enable robots to be controlled through natural language instructions, opening exciting possibilities for human-robot interaction The goal is for the motor-control task to be performed accurately, efficiently and safely while also enjoying the flexibility imparted by LLMs to specify and adjust the task through natural language. In this work, we demonstrate how a careful layering of an LLM in combination with a Model Predictive Control (MPC) formulation allows for accurate and flexible robotic control via natural language while taking into consideration safety constraints. In particular, we rely on the LLM to effectively frame constraints and objective functions as mathematical expressions, which are later used in the motor-control module via MPC. The transparency of the optimization formulation allows for interpretability of the task and enables adjustments through human feedback. We demonstrate the validity of our method through extensive experiments on long-horizon reasoning, contact-rich, and multi-object interaction tasks. Our evaluations show that NARRATE outperforms current existing methods on these benchmarks and effectively transfers to the real world on two different embodiments. Videos, Code and Prompts at narrate-mpc.github.io
Paper Structure (17 sections, 11 equations, 6 figures, 3 tables)

This paper contains 17 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Benchmark Tasks: Given an instruction in natural language, the robotic manipulator is able to autonomously perform a variety of different tasks, both in simulation and on different real robots.
  • Figure 2: Proposed Architecture: Overall architecture for the control of a robot manipulator via natural language. The user provides a task in natural language $\ell$, which then gets translated into a series of steps (TP) and objective and constraints (OD) via two layered blocks of a language model in the language module. The objective and constraints $c,h,g$ are then used as inputs to the control module, which generates a trajectory via MPC (TG) and the low-level control commands (TT) to be applied to the system, in this case, a robotic arm.
  • Figure 3: Evaluation setup We use PandaGym gallouedec2021pandagym to build our custom simulation environments. For the quantitative, real-world evaluation, we use a Franka Emika Panda with two RGB-D cameras (D435i, D455) and ArUco markers to extract object poses.
  • Figure 4: TP-OD responses for the Cook Steak task: the robots have to collaborate in order to move the frying pan above the stove using its two handles. This OD response requires the robot grippers to remain at a constant distance to prevent the pan from falling.
  • Figure 5: Real World Examples: Visualization of two real world evaluations. The respective queries for the tasks were given as (a) "make a stack of cubes on top of the yellow cube" and (b) "build a pyramid with the green and blue cubes at the base and red cube at the top. keep the green cube at its original position". The Aruco markers encode the color of each cube.
  • ...and 1 more figures