Enabling Waypoint Generation for Collaborative Robots using LLMs and Mixed Reality
Cathy Mengying Fang, Krzysztof Zieliński, Pattie Maes, Joe Paradiso, Bruce Blumberg, Mikkel Baun Kjærgaard
TL;DR
This paper tackles the barrier of programming collaborative robots for SMEs by replacing hand-coded scripts with a natural-language interface powered by large language models (LLMs). The authors integrate LLMs with a Mixed Reality (AR) workflow in Unity, a 3D scene model, and URScript to generate and visualize robot waypoints in real time, enabling end-to-end planning from language to execution. A key contribution is an end-to-end framework that previews the planned trajectory in AR and then streams the validated waypoint sequence to a UR robot, demonstrated on a pick-and-place task; they also present initial exploration of expressive robot skills via few-shot animations. The work has practical impact by lowering automation barriers and enabling intuitive human-robot collaboration in real-world settings.
Abstract
Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses large language models (LLM) for prompt processing, workspace understanding, and waypoint generation. It also employs Augmented Reality (AR) to provide visual feedback of the planned outcome. We showcase the effectiveness of our framework with a simple pick-and-place task, which we implement on a real robot. Moreover, we present an early concept of expressive robot behavior and skill generation that can be used to communicate with the user and learn new skills (e.g., object grasping).
