DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models
Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, Hajime Asama
TL;DR
The paper addresses multi-robot coordination under natural language commands by introducing DART-LLM, a dependency-aware framework that uses DAGs to model task relationships. It couples a QA LLM for instruction decomposition with a Breakdown Function, an Actuation module, and a VLM-based perception system, formalized with $G=(\mathbb{T},\mathbb{D})$ and $\mathbb{D}\subseteq \mathbb{T}\times\mathbb{T}$ to ensure correct execution order. Key contributions include DAG-based dependency-aware task decomposition, an end-to-end real-time execution pipeline, and a 102-task construction dataset spanning three complexity levels, plus ablation studies showing DAGs improve performance for smaller models. Real-world deployment demonstrates practical applicability and highlights the trade-offs between model size and responsiveness, suggesting strong deployability for resource-constrained platforms.
Abstract
Large Language Models (LLMs) have demonstrated promising reasoning capabilities in robotics; however, their application in multi-robot systems remains limited, particularly in handling task dependencies. This paper introduces DART-LLM, a novel framework that employs Directed Acyclic Graphs (DAGs) to model task dependencies, enabling the decomposition of natural language instructions into well-coordinated subtasks for multi-robot execution. DART-LLM comprises four key components: a Question-Answering (QA) LLM module for dependency-aware task decomposition, a Breakdown Function module for robot assignment, an Actuation module for execution, and a Vision-Language Model (VLM)-based object detector for environmental perception, achieving end-to-end task execution. Experimental results across three task complexity levels demonstrate that DART-LLM achieves state-of-the-art performance, significantly outperforming the baseline across all evaluation metrics. Among the tested models, DeepSeek-r1-671B achieves the highest success rate, whereas Llama-3.1-8B exhibits superior response time reliability. Ablation studies further confirm that explicit dependency modeling notably enhances the performance of smaller models, facilitating efficient deployment on resource-constrained platforms. Please refer to the project website https://wyd0817.github.io/project-dart-llm/ for videos and code.
