Table of Contents
Fetching ...

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

Jiayan Nan, Wenquan Ma, Wenlong Wu, Yize Chen

TL;DR

Nemori tackles the long-standing amnesia of LLM-based agents by introducing a cognitively grounded memory architecture built on the Two-Step Alignment and Predict-Calibrate principles. It defines an end-to-end flow that autonomously segments conversations into episodes, generates episodic and semantic memories, and proactively distills new knowledge through prediction gaps. Empirical results on LoCoMo and LongMemEvalS show Nemori surpasses state-of-the-art baselines while using substantially less context, especially in long-context settings, and maintain robustness via ablations and hyperparameter analyses. The work demonstrates a viable path toward self-evolving autonomous agents with memory that learns from its own predictions, rather than relying on static or passive memory mechanisms.

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities, yet their inability to maintain persistent memory in long contexts limits their effectiveness as autonomous agents in long-term interactions. While existing memory systems have made progress, their reliance on arbitrary granularity for defining the basic memory unit and passive, rule-based mechanisms for knowledge extraction limits their capacity for genuine learning and evolution. To address these foundational limitations, we present Nemori, a novel self-organizing memory architecture inspired by human cognitive principles. Nemori's core innovation is twofold: First, its Two-Step Alignment Principle, inspired by Event Segmentation Theory, provides a principled, top-down method for autonomously organizing the raw conversational stream into semantically coherent episodes, solving the critical issue of memory granularity. Second, its Predict-Calibrate Principle, inspired by the Free-energy Principle, enables the agent to proactively learn from prediction gaps, moving beyond pre-defined heuristics to achieve adaptive knowledge evolution. This offers a viable path toward handling the long-term, dynamic workflows of autonomous agents. Extensive experiments on the LoCoMo and LongMemEval benchmarks demonstrate that Nemori significantly outperforms prior state-of-the-art systems, with its advantage being particularly pronounced in longer contexts.

Nemori: Self-Organizing Agent Memory Inspired by Cognitive Science

TL;DR

Nemori tackles the long-standing amnesia of LLM-based agents by introducing a cognitively grounded memory architecture built on the Two-Step Alignment and Predict-Calibrate principles. It defines an end-to-end flow that autonomously segments conversations into episodes, generates episodic and semantic memories, and proactively distills new knowledge through prediction gaps. Empirical results on LoCoMo and LongMemEvalS show Nemori surpasses state-of-the-art baselines while using substantially less context, especially in long-context settings, and maintain robustness via ablations and hyperparameter analyses. The work demonstrates a viable path toward self-evolving autonomous agents with memory that learns from its own predictions, rather than relying on static or passive memory mechanisms.

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities, yet their inability to maintain persistent memory in long contexts limits their effectiveness as autonomous agents in long-term interactions. While existing memory systems have made progress, their reliance on arbitrary granularity for defining the basic memory unit and passive, rule-based mechanisms for knowledge extraction limits their capacity for genuine learning and evolution. To address these foundational limitations, we present Nemori, a novel self-organizing memory architecture inspired by human cognitive principles. Nemori's core innovation is twofold: First, its Two-Step Alignment Principle, inspired by Event Segmentation Theory, provides a principled, top-down method for autonomously organizing the raw conversational stream into semantically coherent episodes, solving the critical issue of memory granularity. Second, its Predict-Calibrate Principle, inspired by the Free-energy Principle, enables the agent to proactively learn from prediction gaps, moving beyond pre-defined heuristics to achieve adaptive knowledge evolution. This offers a viable path toward handling the long-term, dynamic workflows of autonomous agents. Extensive experiments on the LoCoMo and LongMemEval benchmarks demonstrate that Nemori significantly outperforms prior state-of-the-art systems, with its advantage being particularly pronounced in longer contexts.

Paper Structure

This paper contains 27 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The conceptual framework of Nemori, illustrating the mapping from problem to principle to computation. The framework addresses two core challenges: defining appropriate input chunks (x) and designing an effective organizing function (f). The Two-Step Alignment Principle (comprising Boundary Alignment and Representation Alignment) solves the input chunking and initial representation problem. Concurrently, the Predict-Calibrate Principle provides a proactive mechanism for the organizing function, which operationalizes them via three core modules: Topic Segmentation, Episodic Memory Generation, and Semantic Memory Generation, as illustrated here.
  • Figure 2: An illustration of different conversation segmentation methods. Standard RAG (left) often relies on arbitrary, fixed-size chunking, which can break the semantic integrity of a dialogue (as shown by the split in the apple discussion). The Interaction Pair model (middle) groups user-assistant turns but can still separate related user messages. In contrast, our proposed Episodic segmentation (right), guided by semantic boundary detection, correctly groups the entire conversation about the apple into a single, coherent episode, preserving the interaction's logical flow.
  • Figure 3: The Nemori system features three modules: Topic Segmentation, Episodic Memory Generation, and Semantic Memory Generation. It segments conversations into Episodic Memory, then uses a Predict-Calibrate cycle to distill new Semantic Memory from prediction gaps against original conversations.
  • Figure 4: Impact of top-k episodes on LLM score across different models. Both models show performance rises sharply until k=10 and then plateaus. The red dashed lines represent Full Context baseline performance for comparison.