Table of Contents
Fetching ...

Memory Intelligence Agent

Jingyang Qiao, Weicheng Meng, Yu Cheng, Zhihang Lin, Zhizhong Zhang, Xin Tan, Jingyu Gong, Kun Shao, Yuan Xie

Abstract

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key limitations of ineffective memory evolution and increasing storage and retrieval costs. To address these problems, we propose a novel Memory Intelligence Agent (MIA) framework, consisting of a Manager-Planner-Executor architecture. Memory Manager is a non-parametric memory system that can store compressed historical search trajectories. Planner is a parametric memory agent that can produce search plans for questions. Executor is another agent that can search and analyze information guided by the search plan. To build the MIA framework, we first adopt an alternating reinforcement learning paradigm to enhance cooperation between the Planner and the Executor. Furthermore, we enable the Planner to continuously evolve during test-time learning, with updates performed on-the-fly alongside inference without interrupting the reasoning process. Additionally, we establish a bidirectional conversion loop between parametric and non-parametric memories to achieve efficient memory evolution. Finally, we incorporate a reflection and an unsupervised judgment mechanisms to boost reasoning and self-evolution in the open world. Extensive experiments across eleven benchmarks demonstrate the superiority of MIA.

Memory Intelligence Agent

Abstract

Deep research agents (DRAs) integrate LLM reasoning with external tools. Memory systems enable DRAs to leverage historical experiences, which are essential for efficient reasoning and autonomous evolution. Existing methods rely on retrieving similar trajectories from memory to aid reasoning, while suffering from key limitations of ineffective memory evolution and increasing storage and retrieval costs. To address these problems, we propose a novel Memory Intelligence Agent (MIA) framework, consisting of a Manager-Planner-Executor architecture. Memory Manager is a non-parametric memory system that can store compressed historical search trajectories. Planner is a parametric memory agent that can produce search plans for questions. Executor is another agent that can search and analyze information guided by the search plan. To build the MIA framework, we first adopt an alternating reinforcement learning paradigm to enhance cooperation between the Planner and the Executor. Furthermore, we enable the Planner to continuously evolve during test-time learning, with updates performed on-the-fly alongside inference without interrupting the reasoning process. Additionally, we establish a bidirectional conversion loop between parametric and non-parametric memories to achieve efficient memory evolution. Finally, we incorporate a reflection and an unsupervised judgment mechanisms to boost reasoning and self-evolution in the open world. Extensive experiments across eleven benchmarks demonstrate the superiority of MIA.

Paper Structure

This paper contains 32 sections, 13 equations, 13 figures, 8 tables, 3 algorithms.

Figures (13)

  • Figure 1: (a). Comparisons between frontier LLMs and their MIA-enhanced counterparts on the LiveVQA (multimodal) dataset. (b). Comparisons between frontier LLMs and their MIA-enhanced counterparts on the HotpotQA (text-only) dataset. (c). Comparisons between MIA based on Qwen2.5-VL-7B Executor with larger LLMs (in non-tool-calling settings) across seven diverse datasets. (d). Comparisons between MIA and SOTA memory frameworks based on Qwen-2.5-VL-7B Executor across seven diverse datasets.
  • Figure 2: A deep research process of MIA to tackle a complex and multi-hop question.
  • Figure 3: Reasoning process of MIA consists of three parts: Inputs & Retrieval is for retrieving memory context similar to the inputs; Research Process is for driving Planner-Executor collaboration via a planning-execution-reflection loop; Outputs & Storage is for compressing search trajectories into structured memory.
  • Figure 4: Prompt for Memory Manager to extract image caption and workflow.
  • Figure 5: The Executor is activated during the first-stage RL training and frozen in the test-time RL process, while the Planner is activated during both the second-stage RL training and the test-time RL process. The memory framework of MIA during exploration: (1) generating multiple plan rollouts; (2) executing the inference pipeline, where a router selects the optimal plan based on prior experience to interact with the environment, strictly ensuring no label leakage; (3) receiving the final feedback from the environment; and (4) completing the training pipeline by calculating rewards and advantages for all rollouts. These evaluations are then used to update both the parametric memory (updating the Planner's parameters via GRPO) and the non-parametric memory (extracting workflows into the Memory Manager).
  • ...and 8 more figures