Table of Contents
Fetching ...

AssemMate: Graph-Based LLM for Robotic Assembly Assistance

Qi Zheng, Chaoran Zhang, Zijian Liang, EnTe Lin, Shubo Cui, Qinghongbing Xie, Zhaobo Xu, Long Zeng

TL;DR

This work presents AssemMate, which utilizes the graph\textemdash a concise and accurate form of knowledge representation\textemdash as input, and outperforms existing methods, achieving 6.4\% higher accuracy, 3 times faster inference, and 28 times shorter context length, while demonstrating strong generalization ability on random graphs.

Abstract

Large Language Model (LLM)-based robotic assembly assistance has gained significant research attention. It requires the injection of domain-specific knowledge to guide the assembly process through natural language interaction with humans. Despite some progress, existing methods represent knowledge in the form of natural language text. Due to the long context and redundant content, they struggle to meet the robots' requirements for real-time and precise reasoning. In order to bridge this gap, we present AssemMate, which utilizes the graph\textemdash a concise and accurate form of knowledge representation\textemdash as input. This graph-based LLM enables knowledge graph question answering (KGQA), supporting human-robot interaction and assembly task planning for specific products. Beyond interactive QA, AssemMate also supports sensing stacked scenes and executing grasping to assist with assembly. Specifically, a self-supervised Graph Convolutional Network (GCN) encodes knowledge graph entities and relations into a latent space and aligns them with LLM's representation, enabling the LLM to understand graph information. In addition, a vision-enhanced strategy is employed to address stacked scenes in grasping. Through training and evaluation, AssemMate outperforms existing methods, achieving 6.4\% higher accuracy, 3 times faster inference, and 28 times shorter context length, while demonstrating strong generalization ability on random graphs. And our approach further demonstrates superiority through robotic grasping experiments in both simulated and real-world settings. More details can be found on the project page: https://github.com/cristina304/AssemMate.git

AssemMate: Graph-Based LLM for Robotic Assembly Assistance

TL;DR

This work presents AssemMate, which utilizes the graph\textemdash a concise and accurate form of knowledge representation\textemdash as input, and outperforms existing methods, achieving 6.4\% higher accuracy, 3 times faster inference, and 28 times shorter context length, while demonstrating strong generalization ability on random graphs.

Abstract

Large Language Model (LLM)-based robotic assembly assistance has gained significant research attention. It requires the injection of domain-specific knowledge to guide the assembly process through natural language interaction with humans. Despite some progress, existing methods represent knowledge in the form of natural language text. Due to the long context and redundant content, they struggle to meet the robots' requirements for real-time and precise reasoning. In order to bridge this gap, we present AssemMate, which utilizes the graph\textemdash a concise and accurate form of knowledge representation\textemdash as input. This graph-based LLM enables knowledge graph question answering (KGQA), supporting human-robot interaction and assembly task planning for specific products. Beyond interactive QA, AssemMate also supports sensing stacked scenes and executing grasping to assist with assembly. Specifically, a self-supervised Graph Convolutional Network (GCN) encodes knowledge graph entities and relations into a latent space and aligns them with LLM's representation, enabling the LLM to understand graph information. In addition, a vision-enhanced strategy is employed to address stacked scenes in grasping. Through training and evaluation, AssemMate outperforms existing methods, achieving 6.4\% higher accuracy, 3 times faster inference, and 28 times shorter context length, while demonstrating strong generalization ability on random graphs. And our approach further demonstrates superiority through robotic grasping experiments in both simulated and real-world settings. More details can be found on the project page: https://github.com/cristina304/AssemMate.git

Paper Structure

This paper contains 15 sections, 11 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Our method, AssemMate, efficiently injects concise graph-structured data into LLM as external knowledge to enable interactive QA during the assembly process. It further supports grasping in stacked scenes, serving as an intelligent mate for robotic assembly assistance.
  • Figure 2: Framework of AssemMate, a self-supervised GCN to encode knowledge graph entities and relations into a latent space, aligning them with LLM's representation. Based on the KGQA, VEGE leverages MLLMs with vision enhancement to sense stacked scenes and generate grasping plans, followed by segmentation and grasp poses generation.
  • Figure 3: In real-world experiments, AssemMate dynamically determines the optimal object to grasp based on the current scene, where the yellow triangle indicates the current considered object and the red star marks the target object.
  • Figure 4: The heavily stacked scenarios within the simulation environment, the target object is marked with a red star.