H$^2$R: Hierarchical Hindsight Reflection for Multi-Task LLM Agents
Shicheng Ye, Chao Yu, Kaiqiang Ke, Chengdong Xu, Yinqi Wei
TL;DR
This work tackles coarse-grained knowledge transfer in multi-task LLM agents by introducing a hierarchical memory system that separates high-level planning memory from low-level execution memory. The central mechanism, Hierarchical Hindsight Reflection ($H^2R$), distills task-level strategies and subgoal-specific execution patterns from past interactions into structured memory units, enabling test-time retrieval that supports hierarchical decision making. Empirical results on AlfWorld and PDDLGame show that $H^2R$ outperforms strong baselines like ReAct and ExpeL, with notable improvements in complex planning scenarios. The findings highlight the value of modular, level-specific memories and reflection-driven memory construction for robust, scalable multi-task reasoning with LLM agents.
Abstract
Large language model (LLM)-based agents have shown strong potential in multi-task scenarios, owing to their ability to transfer knowledge across diverse tasks. However, existing approaches often treat prior experiences and knowledge as monolithic units, leading to inefficient and coarse-grained knowledge transfer. In this work, we propose a novel hierarchical memory architecture that enables fine-grained knowledge transfer by decoupling high-level planning memory from low-level execution memory. To construct and refine these hierarchical memories, we introduce Hierarchical Hindsight Reflection (H$^2$R), a mechanism that distills reusable and hierarchical knowledge from past agent-environment interactions. At test time, H$^2$R performs retrievals of high-level and low-level memories separately, allowing LLM-based agents to efficiently access and utilize task-relevant knowledge for new tasks.Experimental results across two benchmarks demonstrate that H$^2$R can improve generalization and decision-making performance, outperforming prior baselines such as Expel.
