Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory
Sizhe Yuen, Francisco Gomez Medina, Ting Su, Yali Du, Adam J. Sobey
TL;DR
This paper addresses memory challenges in multi-agent LLM systems caused by fixed context windows. It introduces Intrinsic Memory Agents, where each agent maintains an agent-specific memory that evolves intrinsically from outputs using a generic memory template, enabling heterogeneous perspectives and task-aligned memory. The authors demonstrate state-of-the-art or competitive performance on PDDL, FEVER, and ALFWorld benchmarks, with especially strong consistency, and show a data pipeline design case where intrinsic memory improves scalability, reliability, usability, cost-effectiveness, and documentation at the cost of higher token usage. The results suggest that intrinsic memory mechanisms can substantially enhance memory, coordination, and planning in multi-agent LLM systems across structured tasks.
Abstract
Multi-agent systems built on Large Language Models (LLMs) show exceptional promise for complex collaborative problem-solving, yet they face fundamental challenges stemming from context window limitations that impair memory consistency, role adherence, and procedural integrity. This paper introduces Intrinsic Memory Agents, a novel framework that addresses these limitations through agent-specific memories that evolve intrinsically with agent outputs. Specifically, our method maintains role-aligned memory that preserves specialized perspectives while focusing on task-relevant information. Our approach utilises a generic memory template applicable to new problems without the need to hand-craft specific memory prompts. We benchmark our approach on the PDDL, FEVER, and ALFWorld datasets, comparing its performance to existing state-of-the-art multi-agentic memory approaches and showing state-of-the-art or comparable performance across all three, with the highest consistency. An additional evaluation is performed on a complex data pipeline design task, and we demonstrate that our approach produces higher quality designs across 5 metrics: scalability, reliability, usability, cost-effectiveness, and documentation, plus additional qualitative evidence of the improvements. Our findings suggest that addressing memory limitations through intrinsic approaches can improve the capabilities of multi-agent LLM systems on structured planning tasks.
