Table of Contents
Fetching ...

DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models

Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, Hajime Asama

TL;DR

The paper addresses multi-robot coordination under natural language commands by introducing DART-LLM, a dependency-aware framework that uses DAGs to model task relationships. It couples a QA LLM for instruction decomposition with a Breakdown Function, an Actuation module, and a VLM-based perception system, formalized with $G=(\mathbb{T},\mathbb{D})$ and $\mathbb{D}\subseteq \mathbb{T}\times\mathbb{T}$ to ensure correct execution order. Key contributions include DAG-based dependency-aware task decomposition, an end-to-end real-time execution pipeline, and a 102-task construction dataset spanning three complexity levels, plus ablation studies showing DAGs improve performance for smaller models. Real-world deployment demonstrates practical applicability and highlights the trade-offs between model size and responsiveness, suggesting strong deployability for resource-constrained platforms.

Abstract

Large Language Models (LLMs) have demonstrated promising reasoning capabilities in robotics; however, their application in multi-robot systems remains limited, particularly in handling task dependencies. This paper introduces DART-LLM, a novel framework that employs Directed Acyclic Graphs (DAGs) to model task dependencies, enabling the decomposition of natural language instructions into well-coordinated subtasks for multi-robot execution. DART-LLM comprises four key components: a Question-Answering (QA) LLM module for dependency-aware task decomposition, a Breakdown Function module for robot assignment, an Actuation module for execution, and a Vision-Language Model (VLM)-based object detector for environmental perception, achieving end-to-end task execution. Experimental results across three task complexity levels demonstrate that DART-LLM achieves state-of-the-art performance, significantly outperforming the baseline across all evaluation metrics. Among the tested models, DeepSeek-r1-671B achieves the highest success rate, whereas Llama-3.1-8B exhibits superior response time reliability. Ablation studies further confirm that explicit dependency modeling notably enhances the performance of smaller models, facilitating efficient deployment on resource-constrained platforms. Please refer to the project website https://wyd0817.github.io/project-dart-llm/ for videos and code.

DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models

TL;DR

The paper addresses multi-robot coordination under natural language commands by introducing DART-LLM, a dependency-aware framework that uses DAGs to model task relationships. It couples a QA LLM for instruction decomposition with a Breakdown Function, an Actuation module, and a VLM-based perception system, formalized with and to ensure correct execution order. Key contributions include DAG-based dependency-aware task decomposition, an end-to-end real-time execution pipeline, and a 102-task construction dataset spanning three complexity levels, plus ablation studies showing DAGs improve performance for smaller models. Real-world deployment demonstrates practical applicability and highlights the trade-offs between model size and responsiveness, suggesting strong deployability for resource-constrained platforms.

Abstract

Large Language Models (LLMs) have demonstrated promising reasoning capabilities in robotics; however, their application in multi-robot systems remains limited, particularly in handling task dependencies. This paper introduces DART-LLM, a novel framework that employs Directed Acyclic Graphs (DAGs) to model task dependencies, enabling the decomposition of natural language instructions into well-coordinated subtasks for multi-robot execution. DART-LLM comprises four key components: a Question-Answering (QA) LLM module for dependency-aware task decomposition, a Breakdown Function module for robot assignment, an Actuation module for execution, and a Vision-Language Model (VLM)-based object detector for environmental perception, achieving end-to-end task execution. Experimental results across three task complexity levels demonstrate that DART-LLM achieves state-of-the-art performance, significantly outperforming the baseline across all evaluation metrics. Among the tested models, DeepSeek-r1-671B achieves the highest success rate, whereas Llama-3.1-8B exhibits superior response time reliability. Ablation studies further confirm that explicit dependency modeling notably enhances the performance of smaller models, facilitating efficient deployment on resource-constrained platforms. Please refer to the project website https://wyd0817.github.io/project-dart-llm/ for videos and code.

Paper Structure

This paper contains 21 sections, 7 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: An overview of the DART-LLM system architecture. The system is divided into three main modules: the Sensor Module, the Intelligent Command Interface Module, and the Actuation Module. The Sensor Module captures and processes data using a Vision-Language Model (VLM)-based object detector, updating the object map with detected items. The Intelligent Command Interface Module interprets user instructions via a Question-Answering Large Language Model (QA LLM), decomposing tasks into subtasks with dependencies through the Breakdown Function Parser and Handler. This allows the establishment of complex task dependencies and coordination between multiple robots. Finally, the Actuation Module executes real-time operations using the ROS Navigation stack, guiding each robot according to the parsed and dependency-aware instructions.
  • Figure 2: Dependency-aware task decomposition in DART-LLM. Left: QA LLM decomposes high-level instruction into subtasks with explicit dependency lists ($d_k$) and assigned workers. Right: Construction of the Directed Acyclic Graph (DAG) based on dependency relationships, establishing the execution order where $T^1_{t_1}$ is executed first, followed by parallel execution of $T^2_{t_2}$ and $T^3_{t_2}$, and finally $T^4_{t_3}$ after its dependencies are satisfied.
  • Figure 3: Success Rate (SR) comparison between "With Dependencies" (using DAG) and "Without Dependencies" (not using DAG) for L3 complexity tasks across different models.
  • Figure 4: DART-LLM (Llama3.1) in the L2-T1-001 Task Using Real Robots: The sequence begins at 0 [s] with the initial positioning of the Dump Truck at the starting location. At 8 [s], the Excavator prepares to dig soil. By 24 [s], the Excavator transfers the excavated soil into the Dump Truck, completing the loading operation. At 48 [s], the loaded Dump Truck navigates towards the designated dumping area. At 56 [s], the Dump Truck unloads the soil at the puddle location. Finally, at 64 [s], the Dump Truck completes the task by retracting its vessel.