Table of Contents
Fetching ...

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Taeyun Roh, Wonjune Jang, Junha Jung, Jaewoo Kang

Abstract

Large language model agents heavily rely on external memory to support knowledge reuse and complex reasoning tasks. Yet most memory systems store experiences in a single global retrieval pool which can gradually dilute or corrupt stored knowledge. This problem is especially pronounced for small language models (SLMs), which are highly vulnerable to irrelevant context. We introduce CLAG, a CLustering-based AGentic memory framework where an SLM agent actively organizes memory by clustering. CLAG employs an SLM-driven router to assign incoming memories to semantically coherent clusters and autonomously generates cluster-specific profiles, including topic summaries and descriptive tags, to establish each cluster as a self-contained functional unit. By performing localized evolution within these structured neighborhoods, CLAG effectively reduces cross-topic interference and enhances internal memory density. During retrieval, the framework utilizes a two-stage process that first filters relevant clusters via their profiles, thereby excluding distractors and reducing the search space. Experiments on multiple QA datasets with three SLM backbones show that CLAG consistently improves answer quality and robustness over prior memory systems for agents, remaining lightweight and efficient.

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Abstract

Large language model agents heavily rely on external memory to support knowledge reuse and complex reasoning tasks. Yet most memory systems store experiences in a single global retrieval pool which can gradually dilute or corrupt stored knowledge. This problem is especially pronounced for small language models (SLMs), which are highly vulnerable to irrelevant context. We introduce CLAG, a CLustering-based AGentic memory framework where an SLM agent actively organizes memory by clustering. CLAG employs an SLM-driven router to assign incoming memories to semantically coherent clusters and autonomously generates cluster-specific profiles, including topic summaries and descriptive tags, to establish each cluster as a self-contained functional unit. By performing localized evolution within these structured neighborhoods, CLAG effectively reduces cross-topic interference and enhances internal memory density. During retrieval, the framework utilizes a two-stage process that first filters relevant clusters via their profiles, thereby excluding distractors and reducing the search space. Experiments on multiple QA datasets with three SLM backbones show that CLAG consistently improves answer quality and robustness over prior memory systems for agents, remaining lightweight and efficient.
Paper Structure (71 sections, 19 equations, 2 figures, 16 tables, 3 algorithms)

This paper contains 71 sections, 19 equations, 2 figures, 16 tables, 3 algorithms.

Figures (2)

  • Figure 1: Conceptual comparison between existing global memory systems and CLAG. (Left) Traditional approaches manage memories in a single global pool, where topic-mixed updates and retrieval lead to high interference and noise accumulation. (Right) CLAG employs agent-driven clustering to organize memories into semantically coherent neighborhoods. By confining evolution and retrieval to these local clusters, our framework significantly reduces cross-topic interference and enhances memory stability.
  • Figure 2: Overview of the proposed CLAG framework.Left: Agentic Routing. An SLM router assigns each incoming memory note $m_{\text{new}}$ to the most relevant cluster using semantic metadata, and updates the corresponding cluster profile $\mathcal{P}$. Middle: Localized Evolution. An evolution agent performs consolidation (e.g., linking, rewriting, strengthening) within the routed cluster to maintain topic-consistent neighborhoods and reduce cross-topic interference. Right: Two-Stage Retrieval. Given a query, CLAG first filters clusters using profile-based selection (Stage 1), then retrieves fine-grained memories only inside the selected clusters (Stage 2), reducing the effective search space and suppressing retrieval noise.