Table of Contents
Fetching ...

H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents

Shicheng Ye, Chao Yu, Kaiqiang Ke, Chengdong Xu, Yinqi Wei

TL;DR

This work tackles coarse-grained knowledge transfer in multi-task LLM agents by introducing a hierarchical memory system that separates high-level planning memory from low-level execution memory. The central mechanism, Hierarchical Hindsight Reflection ($H^2R$), distills task-level strategies and subgoal-specific execution patterns from past interactions into structured memory units, enabling test-time retrieval that supports hierarchical decision making. Empirical results on AlfWorld and PDDLGame show that $H^2R$ outperforms strong baselines like ReAct and ExpeL, with notable improvements in complex planning scenarios. The findings highlight the value of modular, level-specific memories and reflection-driven memory construction for robust, scalable multi-task reasoning with LLM agents.

Abstract

Large language model (LLM)-based agents have shown strong potential in multi-task scenarios, owing to their ability to transfer knowledge across diverse tasks. However, existing approaches often treat prior experiences and knowledge as monolithic units, leading to inefficient and coarse-grained knowledge transfer. In this work, we propose a novel hierarchical memory architecture that enables fine-grained knowledge transfer by decoupling high-level planning memory from low-level execution memory. To construct and refine these hierarchical memories, we introduce Hierarchical Hindsight Reflection (H$^2$R), a mechanism that distills reusable and hierarchical knowledge from past agent-environment interactions. At test time, H$^2$R performs retrievals of high-level and low-level memories separately, allowing LLM-based agents to efficiently access and utilize task-relevant knowledge for new tasks.Experimental results across two benchmarks demonstrate that H$^2$R can improve generalization and decision-making performance, outperforming prior baselines such as Expel.

H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents

TL;DR

This work tackles coarse-grained knowledge transfer in multi-task LLM agents by introducing a hierarchical memory system that separates high-level planning memory from low-level execution memory. The central mechanism, Hierarchical Hindsight Reflection (), distills task-level strategies and subgoal-specific execution patterns from past interactions into structured memory units, enabling test-time retrieval that supports hierarchical decision making. Empirical results on AlfWorld and PDDLGame show that outperforms strong baselines like ReAct and ExpeL, with notable improvements in complex planning scenarios. The findings highlight the value of modular, level-specific memories and reflection-driven memory construction for robust, scalable multi-task reasoning with LLM agents.

Abstract

Large language model (LLM)-based agents have shown strong potential in multi-task scenarios, owing to their ability to transfer knowledge across diverse tasks. However, existing approaches often treat prior experiences and knowledge as monolithic units, leading to inefficient and coarse-grained knowledge transfer. In this work, we propose a novel hierarchical memory architecture that enables fine-grained knowledge transfer by decoupling high-level planning memory from low-level execution memory. To construct and refine these hierarchical memories, we introduce Hierarchical Hindsight Reflection (HR), a mechanism that distills reusable and hierarchical knowledge from past agent-environment interactions. At test time, HR performs retrievals of high-level and low-level memories separately, allowing LLM-based agents to efficiently access and utilize task-relevant knowledge for new tasks.Experimental results across two benchmarks demonstrate that HR can improve generalization and decision-making performance, outperforming prior baselines such as Expel.

Paper Structure

This paper contains 17 sections, 6 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of Hierarchical Hindsight Reflection ($H^2R$) framework, which consists of four key processes: (1) Subgoal Inference, which decomposes tasks into achieved subgoals given tasks and corresponding task trajectories; (2) Subtrajectory Inference, which segments trajectories by subgoals to extract subtrajectory sequences; (3) Insight Extraction, performed at both high-level (from tasks, subgoals, and trajectories) and low-level (from individual subgoals and their trajectories) to derive reusable and beneficial rules; and (4) Memory Organization, where relevant insights are attached to corresponding memory units. This architecture enables efficient knowledge transfer through level-specific retrieval mechanisms that effectively decouple high-level planning from low-level execution in multi-task scenarios.
  • Figure 2: Overview of utilization of memory components. The system comprises three core components: (1) Memory Module featuring two specialized components: (a) the High-Level Memory Component containing memory units ($m_i^{\text{high}}$) that store task description, subgoal sequence, and planning insights and (b) the Low-Level Memory Component containing memory units ($m_i^{\text{low}}$) that store subgoal description, execution trajectory, and execution insights. For any given task, relevant memory units from both components are retrieved to inform decision making. (2) Planner that decomposes tasks into subgoals using task descriptions, planning history, current trajectories, and retrieved high-level memory, outputting structured subgoals like "shot1 contains cocktail2". (3) Executor that translates subgoals into actionable steps using task context, current subgoals, ongoing trajectories, and retrieved low-level memory, generating action (e.g., "left grasp shot1") or termination signals.