LLMind 2.0: Distributed IoT Automation with Natural Language M2M Communication and Lightweight LLM Agents
Yuyang Du, Qun Yang, Liujianfu Wang, Jingqi Lin, Hongwei Cui, Soung Chang Liew
TL;DR
The paper addresses IoT automation scalability in heterogeneous deployments by moving executable code generation from a centralized LLM coordinator to on-device, lightweight LLM-powered agents that communicate with humans and devices via natural language. It proposes LLMind 2.0, a distributed framework that uses a natural-language M2M interface, a RAG-based API mapping module, and a fault-tolerant coordination protocol to enable parallel, reliable task execution across devices. Key contributions include the three-step on-device code generation pipeline, a JSON-based device API description standard, and empirical validation in warehouse robotics and WiFi networking contexts, demonstrating improvements in scalability, latency, and task success rates, along with profiling of error sources. The work highlights privacy benefits from localized code generation and provides open-source resources and datasets to foster further research on language-driven, distributed IoT automation. The results indicate that natural language can effectively mediate both human-to-machine and machine-to-machine interactions in large-scale IoT ecosystems, enabling better adaptability and collaboration among diverse devices.
Abstract
Recent advances in large language models (LLMs) have generated great interest in their applications for IoT automation and device management. However, centralized approaches struggle to scale across heterogeneous, large-scale systems. We present LLMind 2.0, a distributed framework that embeds lightweight LLM-empowered device agents and adopts natural language for machine-to-machine (M2M) communication. In LLMind 2.0, a central coordinator translates human instructions into natural-language subtask descriptions, which instruct distributed device agents to generate device-specific code locally based on their proprietary APIs. Using natural language as a unified medium overcomes device heterogeneity and enables seamless device collaboration. LLMind 2.0 integrates: 1) a timeout-based deadlock avoidance protocol that coordinates distributed subtask executions, 2) a retrieval-augmented generation (RAG) mechanism for precise subtask-to-API mapping, and 3) fine-tuned lightweight LLMs for reliable, device-specific code generation. Experiments in multi-robot warehouse operations and Wi Fi network deployments show LLMind 2.0 improved scalability, reliability, and responsiveness compared to centralized baselines.
