Table of Contents
Fetching ...

An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue

Kailun Ji, Xiaoyu Hu, Xinyu Zhang, Jun Chen

TL;DR

The paper tackles the intention-to-action gap in disaster SAR by introducing the LLM-CRF, a cognitive reasoning framework that uses multi-modal operator input and an LLM-based engine to ground intent, decompose tasks, and plan UAV swarm actions. It couples Intent Grounding, In-Context Learning–driven swarm task planning, and Closed-Loop Verification to produce auditable, executable plans while keeping a human-in-the-loop for safety. In simulations, LLM-CRF outperforms manual and baseline LLM approaches in mission success, search coverage, and survivor detection, while significantly reducing operator workload. The work demonstrates a viable pathway toward intuitive, safe, and scalable human-swarm collaboration in high-stakes SAR scenarios and outlines directions to address sensor-noise and real-world deployment challenges.

Abstract

Large-scale disaster Search And Rescue (SAR) operations are persistently challenged by complex terrain and disrupted communications. While Unmanned Aerial Vehicle (UAV) swarms offer a promising solution for tasks like wide-area search and supply delivery, yet their effective coordination places a significant cognitive burden on human operators. The core human-machine collaboration bottleneck lies in the ``intention-to-action gap'', which is an error-prone process of translating a high-level rescue objective into a low-level swarm command under high intensity and pressure. To bridge this gap, this study proposes a novel LLM-CRF system that leverages Large Language Models (LLMs) to model and augment human-swarm teaming cognition. The proposed framework initially captures the operator's intention through natural and multi-modal interactions with the device via voice or graphical annotations. It then employs the LLM as a cognitive engine to perform intention comprehension, hierarchical task decomposition, and mission planning for the UAV swarm. This closed-loop framework enables the swarm to act as a proactive partner, providing active feedback in real-time while reducing the need for manual monitoring and control, which considerably advances the efficacy of the SAR task. We evaluate the proposed framework in a simulated SAR scenario. Experimental results demonstrate that, compared to traditional order and command-based interfaces, the proposed LLM-driven approach reduced task completion time by approximately $64.2\%$ and improved task success rate by $7\%$. It also leads to a considerable reduction in subjective cognitive workload, with NASA-TLX scores dropping by $42.9\%$. This work establishes the potential of LLMs to create more intuitive and effective human-swarm collaborations in high-stakes scenarios.

An LLM-based Framework for Human-Swarm Teaming Cognition in Disaster Search and Rescue

TL;DR

The paper tackles the intention-to-action gap in disaster SAR by introducing the LLM-CRF, a cognitive reasoning framework that uses multi-modal operator input and an LLM-based engine to ground intent, decompose tasks, and plan UAV swarm actions. It couples Intent Grounding, In-Context Learning–driven swarm task planning, and Closed-Loop Verification to produce auditable, executable plans while keeping a human-in-the-loop for safety. In simulations, LLM-CRF outperforms manual and baseline LLM approaches in mission success, search coverage, and survivor detection, while significantly reducing operator workload. The work demonstrates a viable pathway toward intuitive, safe, and scalable human-swarm collaboration in high-stakes SAR scenarios and outlines directions to address sensor-noise and real-world deployment challenges.

Abstract

Large-scale disaster Search And Rescue (SAR) operations are persistently challenged by complex terrain and disrupted communications. While Unmanned Aerial Vehicle (UAV) swarms offer a promising solution for tasks like wide-area search and supply delivery, yet their effective coordination places a significant cognitive burden on human operators. The core human-machine collaboration bottleneck lies in the ``intention-to-action gap'', which is an error-prone process of translating a high-level rescue objective into a low-level swarm command under high intensity and pressure. To bridge this gap, this study proposes a novel LLM-CRF system that leverages Large Language Models (LLMs) to model and augment human-swarm teaming cognition. The proposed framework initially captures the operator's intention through natural and multi-modal interactions with the device via voice or graphical annotations. It then employs the LLM as a cognitive engine to perform intention comprehension, hierarchical task decomposition, and mission planning for the UAV swarm. This closed-loop framework enables the swarm to act as a proactive partner, providing active feedback in real-time while reducing the need for manual monitoring and control, which considerably advances the efficacy of the SAR task. We evaluate the proposed framework in a simulated SAR scenario. Experimental results demonstrate that, compared to traditional order and command-based interfaces, the proposed LLM-driven approach reduced task completion time by approximately and improved task success rate by . It also leads to a considerable reduction in subjective cognitive workload, with NASA-TLX scores dropping by . This work establishes the potential of LLMs to create more intuitive and effective human-swarm collaborations in high-stakes scenarios.

Paper Structure

This paper contains 11 sections, 2 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: The UAV Swarm Disaster SAR Workflow. The traditional approach (above) creates a significant "intention-to-action gap", imposing a heavy cognitive workload on human operators. Our proposed framework (below) bridges this gap by leveraging an LLM-based core to intelligently decompose high-level multi-modal intention into an executable swarm plan.
  • Figure 2: The proposed LLM-based Cognitive Reasoning Framework (LLM-CRF). The system translates raw multi-modal inputs into executable actions through a three-stage process, including intent grounding, swarm task planning, and feedback and execution.
  • Figure 3: The proposed LLM-CRF interface demonstration. Left: Real-time UAV swarm site scenario (video stream and thermal feedback); Middle: LLM-CRF dialogue with generated plans; Right: Dashboard for UAV and task parameters.