Table of Contents
Fetching ...

Enabling Waypoint Generation for Collaborative Robots using LLMs and Mixed Reality

Cathy Mengying Fang, Krzysztof Zieliński, Pattie Maes, Joe Paradiso, Bruce Blumberg, Mikkel Baun Kjærgaard

TL;DR

This paper tackles the barrier of programming collaborative robots for SMEs by replacing hand-coded scripts with a natural-language interface powered by large language models (LLMs). The authors integrate LLMs with a Mixed Reality (AR) workflow in Unity, a 3D scene model, and URScript to generate and visualize robot waypoints in real time, enabling end-to-end planning from language to execution. A key contribution is an end-to-end framework that previews the planned trajectory in AR and then streams the validated waypoint sequence to a UR robot, demonstrated on a pick-and-place task; they also present initial exploration of expressive robot skills via few-shot animations. The work has practical impact by lowering automation barriers and enabling intuitive human-robot collaboration in real-world settings.

Abstract

Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses large language models (LLM) for prompt processing, workspace understanding, and waypoint generation. It also employs Augmented Reality (AR) to provide visual feedback of the planned outcome. We showcase the effectiveness of our framework with a simple pick-and-place task, which we implement on a real robot. Moreover, we present an early concept of expressive robot behavior and skill generation that can be used to communicate with the user and learn new skills (e.g., object grasping).

Enabling Waypoint Generation for Collaborative Robots using LLMs and Mixed Reality

TL;DR

This paper tackles the barrier of programming collaborative robots for SMEs by replacing hand-coded scripts with a natural-language interface powered by large language models (LLMs). The authors integrate LLMs with a Mixed Reality (AR) workflow in Unity, a 3D scene model, and URScript to generate and visualize robot waypoints in real time, enabling end-to-end planning from language to execution. A key contribution is an end-to-end framework that previews the planned trajectory in AR and then streams the validated waypoint sequence to a UR robot, demonstrated on a pick-and-place task; they also present initial exploration of expressive robot skills via few-shot animations. The work has practical impact by lowering automation barriers and enabling intuitive human-robot collaboration in real-world settings.

Abstract

Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses large language models (LLM) for prompt processing, workspace understanding, and waypoint generation. It also employs Augmented Reality (AR) to provide visual feedback of the planned outcome. We showcase the effectiveness of our framework with a simple pick-and-place task, which we implement on a real robot. Moreover, we present an early concept of expressive robot behavior and skill generation that can be used to communicate with the user and learn new skills (e.g., object grasping).
Paper Structure (14 sections, 4 figures)

This paper contains 14 sections, 4 figures.

Figures (4)

  • Figure 1: Overview of our framework. Starting from the top left, a 3D scene (optionally scanned, if it does not exist already) and a user prompt are fed to adapted LLMR framework, which is an orchestration of prompt-engineered GPT modules. The adapted framework outputs the trajectory based on the user prompt, which is converted to Universal Robots script, readable by the robot arm. The user also sees the rendered trajectory in the AR headset.
  • Figure 2: Example interaction between the user and the collaborative robot arm enabled by our framework. Mixed Reality views are outlined in yellow. A-B: The user is wearing a HoloLens2 AR headset and instructs the robot to create a pick-and-place program between two stools. C: Our framework (running within Unity on a separate laptop) generates a series of waypoints (indicated as the red spheres) and the waypoints are streamed to preview in the AR headset. D: Once the user is satisfied with the waypoints, the robot receives the command from Unity and then follows the waypoints.
  • Figure 3: A pre-scanned scene and a model of the robot arm are loaded in the Unity environment. A translucent reachability sphere is shown to indicate the conservative estimation of the maximum reach of the robot arm.
  • Figure 4: An example of an expressive response from the robot back to the user. A: The user asks the robot if it is happy with the generated program. B-C: The robot responds by nodding back at the user, where the nodding animation is generated by our framework.