Table of Contents
Fetching ...

Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory

Sizhe Yuen, Francisco Gomez Medina, Ting Su, Yali Du, Adam J. Sobey

TL;DR

This paper addresses memory challenges in multi-agent LLM systems caused by fixed context windows. It introduces Intrinsic Memory Agents, where each agent maintains an agent-specific memory that evolves intrinsically from outputs using a generic memory template, enabling heterogeneous perspectives and task-aligned memory. The authors demonstrate state-of-the-art or competitive performance on PDDL, FEVER, and ALFWorld benchmarks, with especially strong consistency, and show a data pipeline design case where intrinsic memory improves scalability, reliability, usability, cost-effectiveness, and documentation at the cost of higher token usage. The results suggest that intrinsic memory mechanisms can substantially enhance memory, coordination, and planning in multi-agent LLM systems across structured tasks.

Abstract

Multi-agent systems built on Large Language Models (LLMs) show exceptional promise for complex collaborative problem-solving, yet they face fundamental challenges stemming from context window limitations that impair memory consistency, role adherence, and procedural integrity. This paper introduces Intrinsic Memory Agents, a novel framework that addresses these limitations through agent-specific memories that evolve intrinsically with agent outputs. Specifically, our method maintains role-aligned memory that preserves specialized perspectives while focusing on task-relevant information. Our approach utilises a generic memory template applicable to new problems without the need to hand-craft specific memory prompts. We benchmark our approach on the PDDL, FEVER, and ALFWorld datasets, comparing its performance to existing state-of-the-art multi-agentic memory approaches and showing state-of-the-art or comparable performance across all three, with the highest consistency. An additional evaluation is performed on a complex data pipeline design task, and we demonstrate that our approach produces higher quality designs across 5 metrics: scalability, reliability, usability, cost-effectiveness, and documentation, plus additional qualitative evidence of the improvements. Our findings suggest that addressing memory limitations through intrinsic approaches can improve the capabilities of multi-agent LLM systems on structured planning tasks.

Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory

TL;DR

This paper addresses memory challenges in multi-agent LLM systems caused by fixed context windows. It introduces Intrinsic Memory Agents, where each agent maintains an agent-specific memory that evolves intrinsically from outputs using a generic memory template, enabling heterogeneous perspectives and task-aligned memory. The authors demonstrate state-of-the-art or competitive performance on PDDL, FEVER, and ALFWorld benchmarks, with especially strong consistency, and show a data pipeline design case where intrinsic memory improves scalability, reliability, usability, cost-effectiveness, and documentation at the cost of higher token usage. The results suggest that intrinsic memory mechanisms can substantially enhance memory, coordination, and planning in multi-agent LLM systems across structured tasks.

Abstract

Multi-agent systems built on Large Language Models (LLMs) show exceptional promise for complex collaborative problem-solving, yet they face fundamental challenges stemming from context window limitations that impair memory consistency, role adherence, and procedural integrity. This paper introduces Intrinsic Memory Agents, a novel framework that addresses these limitations through agent-specific memories that evolve intrinsically with agent outputs. Specifically, our method maintains role-aligned memory that preserves specialized perspectives while focusing on task-relevant information. Our approach utilises a generic memory template applicable to new problems without the need to hand-craft specific memory prompts. We benchmark our approach on the PDDL, FEVER, and ALFWorld datasets, comparing its performance to existing state-of-the-art multi-agentic memory approaches and showing state-of-the-art or comparable performance across all three, with the highest consistency. An additional evaluation is performed on a complex data pipeline design task, and we demonstrate that our approach produces higher quality designs across 5 metrics: scalability, reliability, usability, cost-effectiveness, and documentation, plus additional qualitative evidence of the improvements. Our findings suggest that addressing memory limitations through intrinsic approaches can improve the capabilities of multi-agent LLM systems on structured planning tasks.

Paper Structure

This paper contains 25 sections, 3 equations, 13 figures, 4 tables, 2 algorithms.

Figures (13)

  • Figure 1: Intrinsic Memory Agents Framework. For $n$ agents and $m$ conversation turns, each agent $A_n$ contains its own role description $R_n$ and language model $L_n$. Its memory $M_{n,m}$ is updated based on the input context $C_{n,m}$ and output $O_{n,m}$.
  • Figure 2: Intrinsic Memory performance across the three benchmarks, the blue bars are our Intrinsic Memory.
  • Figure 3: LLM-as-a-Judge metrics for the Data Pipeline design case study.
  • Figure 4: Snippets of one component within the data pipeline design from both systems. The full outputs can be found in the appendix in Figures \ref{['fig:intrinsic_memory_output_pipeline_full']} and \ref{['fig:baseline_autogen_output_pipeline_full']}.
  • Figure 5: Manually generated prompt for the LLM agent in the PDDL task.
  • ...and 8 more figures