Enhancing Human-Robot Collaborative Assembly in Manufacturing Systems Using Large Language Models
Jonghan Lim, Sujani Patel, Alex Evans, John Pimley, Yifei Li, Ilya Kovalenko
TL;DR
This work addresses the communication bottleneck in human–robot collaborative manufacturing by proposing an LLM-driven framework that translates natural-language voice commands into robot actions via a two-layer architecture: a physical layer handling workspace data and an event-driven NL alert system, and a virtual layer with a human command interpreter and a robot task executor. The system leverages GPT-4 function calling, OpenAI speech tools (Whisper for transcription and TTS for responses), and a vision stage (Yolov5) to perform assembly tasks with real-time error handling and re-entry from the last completed subtask $t_{i\text{c}}$. A cable shark assembly case study demonstrates end-to-end NL-guided coordination, achieving high success when instructions are specific and revealing weaknesses with vague commands, highlighting the need for clearer initialization and potential knowledge-base support. The results indicate that LLMs can enhance human–robot interaction in collaborative manufacturing by enabling adaptive responses to language variation and environmental changes, with practical implications for reducing robotics training needs and improving assembly efficiency. $T$ denotes the task set, $C$ the robot capabilities, ${t_{i1},...,t_{ik}}$ the subtasks, and $t_{i\text{c}}$ the last completed subtask, while $M_{e_i}(t_i)$ and $M_c(t_i)$ represent error and completion messages, respectively.
Abstract
The development of human-robot collaboration has the ability to improve manufacturing system performance by leveraging the unique strengths of both humans and robots. On the shop floor, human operators contribute with their adaptability and flexibility in dynamic situations, while robots provide precision and the ability to perform repetitive tasks. However, the communication gap between human operators and robots limits the collaboration and coordination of human-robot teams in manufacturing systems. Our research presents a human-robot collaborative assembly framework that utilizes a large language model for enhancing communication in manufacturing environments. The framework facilitates human-robot communication by integrating voice commands through natural language for task management. A case study for an assembly task demonstrates the framework's ability to process natural language inputs and address real-time assembly challenges, emphasizing adaptability to language variation and efficiency in error resolution. The results suggest that large language models have the potential to improve human-robot interaction for collaborative manufacturing assembly applications.
