Table of Contents
Fetching ...

CoViLLM: An Adaptive Human-Robot Collaborative Assembly Framework Using Large Language Models for Manufacturing

Jiabao Zhao, Jonghan Lim, Hongliang Li, Ilya Kovalenko

Abstract

With increasing demand for mass customization, traditional manufacturing robots that rely on rule-based operations lack the flexibility to accommodate customized or new product variants. Human-Robot Collaboration (HRC) has demonstrated potential to improve system adaptability by leveraging human versatility and decision-making capabilities. However, existing HRC frame- works typically depend on predefined perception-manipulation pipelines, limiting their ability to autonomously generate task plans for new product assembly. In this work, we propose CoViLLM, an adaptive human-robot collaborative assembly frame- work that supports the assembly of customized and previously unseen products. CoViLLM combines depth-camera-based localization for object position estimation, human operator classification for identifying new components, and an Large Language Model (LLM) for assembly task planning based on natural language instructions. The framework is validated on the NIST Assembly Task Board for known, customized, and new product cases. Experimental results show that the proposed framework enables flexible collaborative assembly by extending HRC beyond predefined product and task settings.

CoViLLM: An Adaptive Human-Robot Collaborative Assembly Framework Using Large Language Models for Manufacturing

Abstract

With increasing demand for mass customization, traditional manufacturing robots that rely on rule-based operations lack the flexibility to accommodate customized or new product variants. Human-Robot Collaboration (HRC) has demonstrated potential to improve system adaptability by leveraging human versatility and decision-making capabilities. However, existing HRC frame- works typically depend on predefined perception-manipulation pipelines, limiting their ability to autonomously generate task plans for new product assembly. In this work, we propose CoViLLM, an adaptive human-robot collaborative assembly frame- work that supports the assembly of customized and previously unseen products. CoViLLM combines depth-camera-based localization for object position estimation, human operator classification for identifying new components, and an Large Language Model (LLM) for assembly task planning based on natural language instructions. The framework is validated on the NIST Assembly Task Board for known, customized, and new product cases. Experimental results show that the proposed framework enables flexible collaborative assembly by extending HRC beyond predefined product and task settings.
Paper Structure (11 sections, 3 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 11 sections, 3 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 2: Manufacturing Collaborative Assembly Example. (Note: il lustration is for conceptual purposes only and does not reflect actual mechanical scale)
  • Figure 3: Architecture of LLM-enabled human-robot collaborative assembly framework.
  • Figure 4: Object Localization through Algorithm 1
  • Figure 5: Overview of the experimental manufacturing setup: (a) Top-down view of the physical workspace, showing the arrangement of robot and the assembly board. (b) Detailed view of the assembly board components: gears (red arrows), circular pins (green arrows), and rectangular pins (blue arrows).
  • Figure 6: Valid camera height for effective object localization
  • ...and 1 more figures