Table of Contents
Fetching ...

MICA: Multi-Agent Industrial Coordination Assistant

Di Wen, Kunyu Peng, Junwei Zheng, Yufan Chen, Yitian Shi, Jiale Wei, Ruiping Liu, Kailun Yang, Rainer Stiefelhagen

TL;DR

This work presents MICA (Multi-Agent Industrial Coordination Assistant), a perception-grounded and speech-interactive system that delivers real-time guidance for assembly, troubleshooting, part queries, and maintenance, and introduces Adaptive Step Fusion (ASF), which dynamically blends expert reasoning with online adaptation from natural speech feedback.

Abstract

Industrial workflows demand adaptive and trustworthy assistance that can operate under limited computing, connectivity, and strict privacy constraints. In this work, we present MICA (Multi-Agent Industrial Coordination Assistant), a perception-grounded and speech-interactive system that delivers real-time guidance for assembly, troubleshooting, part queries, and maintenance. MICA coordinates five role-specialized language agents, audited by a safety checker, to ensure accurate and compliant support. To achieve robust step understanding, we introduce Adaptive Step Fusion (ASF), which dynamically blends expert reasoning with online adaptation from natural speech feedback. Furthermore, we establish a new multi-agent coordination benchmark across representative task categories and propose evaluation metrics tailored to industrial assistance, enabling systematic comparison of different coordination topologies. Our experiments demonstrate that MICA consistently improves task success, reliability, and responsiveness over baseline structures, while remaining deployable on practical offline hardware. Together, these contributions highlight MICA as a step toward deployable, privacy-preserving multi-agent assistants for dynamic factory environments. The source code will be made publicly available at https://github.com/Kratos-Wen/MICA.

MICA: Multi-Agent Industrial Coordination Assistant

TL;DR

This work presents MICA (Multi-Agent Industrial Coordination Assistant), a perception-grounded and speech-interactive system that delivers real-time guidance for assembly, troubleshooting, part queries, and maintenance, and introduces Adaptive Step Fusion (ASF), which dynamically blends expert reasoning with online adaptation from natural speech feedback.

Abstract

Industrial workflows demand adaptive and trustworthy assistance that can operate under limited computing, connectivity, and strict privacy constraints. In this work, we present MICA (Multi-Agent Industrial Coordination Assistant), a perception-grounded and speech-interactive system that delivers real-time guidance for assembly, troubleshooting, part queries, and maintenance. MICA coordinates five role-specialized language agents, audited by a safety checker, to ensure accurate and compliant support. To achieve robust step understanding, we introduce Adaptive Step Fusion (ASF), which dynamically blends expert reasoning with online adaptation from natural speech feedback. Furthermore, we establish a new multi-agent coordination benchmark across representative task categories and propose evaluation metrics tailored to industrial assistance, enabling systematic comparison of different coordination topologies. Our experiments demonstrate that MICA consistently improves task success, reliability, and responsiveness over baseline structures, while remaining deployable on practical offline hardware. Together, these contributions highlight MICA as a step toward deployable, privacy-preserving multi-agent assistants for dynamic factory environments. The source code will be made publicly available at https://github.com/Kratos-Wen/MICA.

Paper Structure

This paper contains 16 sections, 11 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the proposed MICA system. Egocentric vision and speech queries are processed into structured object contexts via YOLO-based detection and depth estimation. These contexts, together with state-graph priors and knowledge base information, support Adaptive Step Fusion (ASF) for robust step recognition. The MICA-core then integrates perception and reasoning to deliver safety-audited, speech-based guidance in real time.
  • Figure 2: An overview of the multi-agent LLM baseline architectures for comparison: (1) SharedMemory: decentralized peer-to-peer with a shared memory and evaluator; (2) CentralizedBroadcast: hub-and-spoke publish–subscribe with an aggregator; (3) HierarchicalPipeline: fixed sequential relay across specialists; (4) DebateVoting: peer debate followed by consensus voting.
  • Figure 3: Qualitative comparison of four representative multi-agent topologies (SharedMemory, CentralizedBroadcast, HierarchicalPipeline, DebateVoting) against MICA on three representative queries.