Table of Contents
Fetching ...

Coarse-to-Fine Grounded Memory for LLM Agent Planning

Wei Yang, Jinwei Xiao, Hongming Zhang, Qingyang Zhang, Yanna Wang, Bo Xu

TL;DR

The paper tackles the challenge of memory quality and adaptability in LLM-based planning by introducing Coarse-to-Fine Grounded Memory (CFGM), which grounds memories across coarse, hybrid, and fine granularities using the LLM's internal knowledge. The approach combines coarse-grained focus points to steer experience collection, hybrid-grained tips distilled from trajectories, and fine-grained key information reflections during online planning to handle anomalies, with retrieval of relevant memories guiding decisions. Empirical evaluations across AlfWorld, WebShop, and ScienceWorld show CFGM achieving state-of-the-art performance and robust ablations demonstrating the contributions of each memory-grounding component. The work presents a principled, integrated memory grounding framework that improves exploration, memory diversity, and adaptive planning for complex interactive tasks, with evidence of cross-model generalization and practical efficiency gains.

Abstract

Recent advancements in Large Language Models (LLMs) have driven growing interest in LLM-based agents for complex planning tasks. To avoid costly agent training, many studies adopted memory mechanism that enhances LLM with offline experiences or online trajectory analysis. However, existing works focus on single-granularity memory derived from dynamic environmental interactions, which are inherently constrained by the quality of the collected experiences. This limitation, in turn, constrain the diversity of knowledge and the flexibility of planning. We propose Coarse-to-Fine Grounded Memory (\Ours{}), a novel framework that grounds coarse-to-fine memories with LLM, thereby fully leverage them for flexible adaptation to diverse scenarios. \Ours{} grounds environmental information into coarse-grained focus points to guide experience collection in training tasks, followed by grounding of actionable hybrid-grained tips from each experience. At inference, \Ours{} retrieves task-relevant experiences and tips to support planning. When facing environmental anomalies, the LLM grounds the current situation into fine-grained key information, enabling flexible self-QA reflection and plan correction.

Coarse-to-Fine Grounded Memory for LLM Agent Planning

TL;DR

The paper tackles the challenge of memory quality and adaptability in LLM-based planning by introducing Coarse-to-Fine Grounded Memory (CFGM), which grounds memories across coarse, hybrid, and fine granularities using the LLM's internal knowledge. The approach combines coarse-grained focus points to steer experience collection, hybrid-grained tips distilled from trajectories, and fine-grained key information reflections during online planning to handle anomalies, with retrieval of relevant memories guiding decisions. Empirical evaluations across AlfWorld, WebShop, and ScienceWorld show CFGM achieving state-of-the-art performance and robust ablations demonstrating the contributions of each memory-grounding component. The work presents a principled, integrated memory grounding framework that improves exploration, memory diversity, and adaptive planning for complex interactive tasks, with evidence of cross-model generalization and practical efficiency gains.

Abstract

Recent advancements in Large Language Models (LLMs) have driven growing interest in LLM-based agents for complex planning tasks. To avoid costly agent training, many studies adopted memory mechanism that enhances LLM with offline experiences or online trajectory analysis. However, existing works focus on single-granularity memory derived from dynamic environmental interactions, which are inherently constrained by the quality of the collected experiences. This limitation, in turn, constrain the diversity of knowledge and the flexibility of planning. We propose Coarse-to-Fine Grounded Memory (\Ours{}), a novel framework that grounds coarse-to-fine memories with LLM, thereby fully leverage them for flexible adaptation to diverse scenarios. \Ours{} grounds environmental information into coarse-grained focus points to guide experience collection in training tasks, followed by grounding of actionable hybrid-grained tips from each experience. At inference, \Ours{} retrieves task-relevant experiences and tips to support planning. When facing environmental anomalies, the LLM grounds the current situation into fine-grained key information, enabling flexible self-QA reflection and plan correction.

Paper Structure

This paper contains 29 sections, 2 equations, 15 figures, 7 tables, 3 algorithms.

Figures (15)

  • Figure 1: Conceptual overview of CFGM. During the offline training, our method first extracts coarse-grained focus points to guide experience collection, then distills hybrid-grained tips from these experiences. At the online inference time, it retrieves relevant experiences and tips for planning. When encountering anomalies, the system identifies fine-grained key details for adaptive self-QA and plan adjustment.
  • Figure 2: Framework of CFGM. CFGM collect experiences offline with coarse-grained focus points grounded from environmental information, followed by extracting the hybrid-grained tips grounded from trajectories of each experience to construct tips dictionary. Then, agent's online planning will be enhanced by retrieved experiences and tips, in which the fine-grained self-QA reflection is activated by the key information grounded from current situation and relevant history when anomaly observed. The orange arrow represents the memory grounding process.
  • Figure 3: The SR achieved by different methods using various models on AlfWorld. CFGM demonstrates strong generalization across different models and consistently outperforms the baselines.
  • Figure 4: The prompt template of focus points generation model in the ALFWorld, WebShop and ScienceWorld.
  • Figure 5: The prompt template of tips generation from compare trajectories.
  • ...and 10 more figures