Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input

Dimitrios Panagopoulos; Adolfo Perrusquia; Weisi Guo

Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input

Dimitrios Panagopoulos, Adolfo Perrusquia, Weisi Guo

TL;DR

The paper addresses the challenge of rapid and context-aware decision making in search-and-rescue (SAR) by marrying large language models (LLMs) with hierarchical reinforcement learning (HRL). It introduces a conceptual architecture with a Context Extractor, Information Space, Strategic Decision Engine, and Attention Space to convert verbal human input into actionable, multi-level policies, formalized within an extended MDP framework. By leveraging Retrieval-Augmented Generation (RAG) to inject domain knowledge, and evaluating in a simulated 2D SAR environment, the study shows that domain-informed LLMs and attention-guided HRL can improve learning efficiency, safety, and information gathering under sparse rewards. The findings suggest that human-in-the-loop, language-enabled planning can significantly enhance autonomous SAR performance, offering practical benefits for real-world disaster response while highlighting areas for scalability and robustness in continuous domains.

Abstract

In recent years, robots and autonomous systems have become increasingly integral to our daily lives, offering solutions to complex problems across various domains. Their application in search and rescue (SAR) operations, however, presents unique challenges. Comprehensively exploring the disaster-stricken area is often infeasible due to the vastness of the terrain, transformed environment, and the time constraints involved. Traditional robotic systems typically operate on predefined search patterns and lack the ability to incorporate and exploit ground truths provided by human stakeholders, which can be the key to speeding up the learning process and enhancing triage. Addressing this gap, we introduce a system that integrates social interaction via large language models (LLMs) with a hierarchical reinforcement learning (HRL) framework. The proposed system is designed to translate verbal inputs from human stakeholders into actionable RL insights and adjust its search strategy. By leveraging human-provided information through LLMs and structuring task execution through HRL, our approach not only bridges the gap between autonomous capabilities and human intelligence but also significantly improves the agent's learning efficiency and decision-making process in environments characterised by long horizons and sparse rewards.

Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input

TL;DR

Abstract

Paper Structure (17 sections, 1 equation, 2 figures, 1 table)

This paper contains 17 sections, 1 equation, 2 figures, 1 table.

INTRODUCTION
State-of-the-Art and Gaps
Opportunities in LLMs and HRL
Novelty
Problem Statement and Modelling
Conceptual Architecture
Modelling
Experiments
Simulated Environment Setup
Implementation Details
Hypotheses
Results & Discussion
Hypothesis 1: Domain-Knowledge Infused LLMs
Hypothesis 2: RL with Attention Space
Hypothesis 3 & 4: HRL (with Attention Space) in Sparse Reward Environment
...and 2 more sections

Figures (2)

Figure 1: Left: The figure illustrates the proposed pipeline within a hierarchical decision-making framework. The Environment provides observations $s$ to both the SDE and the Worker modules. When these observations $s$ contain verbal input $v$, the latter is directed to the Context Extractor, which then generates contextual outputs $c$. These outputs $c$, along with observations $s$ from the Environment and information priorities $M$ specified by the Information Space, are channeled into the SDE. Within the SDE, strategies $\omega$ are refined, taking into account the Attention Space. The Worker module, informed by these refined strategies $\omega$, executes primitive actions $\alpha$ within the Environment. As a result of these interactions, the system continually adjusts and updates its policies, creating a dynamic feedback loop that evolves over time. Right: Comparison of outputs from LLMs with and without RAG integration in simulated SAR operations, demonstrating enhanced task-specific detail and notation in outputs.
Figure 2: Comparative Analysis of Learning Agents in SAR Scenarios. Left: Performance of flat RL agents, with and without attention guidance, receiving intrinsic rewards. Middle: Comparison of hierarchical (HRL) and flat RL agents, with and without attention guidance under sparse reward - reward is given upon successful task completion - conditions. Right: SAR environment configuration showing information locations marked as 'INFO', obstacles 'D', victim locations 'VIC', hazards 'F', and points of interest 'P'.

Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input

TL;DR

Abstract

Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input

Authors

TL;DR

Abstract

Table of Contents

Figures (2)