Table of Contents
Fetching ...

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

Yicong Zheng, Kevin L. McKee, Thomas Miconi, Zacharie Bugaud, Mick van Gelderen, Jed McCaleb

TL;DR

This work argues that long-context memory tasks benefit more from goal-directed search over uncompressed memory than from hand-engineered compression. It introduces SUMER, an RLVR-enabled agent trained with GRPO to search raw conversational logs via semantic/keyword tools and to submit answers when ready, avoiding lossy memory compression. On LoCoMo, SUMER with GRPO outperforms strong baselines and achieves state-of-the-art judge accuracy, demonstrating that dynamic search strategies can scale better than fixed memory schemas. The results highlight the potential of learnable search policies for lifelong agents and motivate new benchmarks that emphasize dynamic information retrieval over static compression.

Abstract

How to enable human-like long-term memory in large language models (LLMs) has been a central question for unlocking more general capabilities such as few-shot generalization. Existing memory frameworks and benchmarks focus on finding the optimal memory compression algorithm for higher performance in tasks that require recollection and sometimes further reasoning. However, such efforts have ended up building more human bias into the compression algorithm, through the search for the best prompts and memory architectures that suit specific benchmarks, rather than finding a general solution that would work on other data distributions. On the other hand, goal-directed search on uncompressed information could potentially exhibit superior performance because compression is lossy, and a predefined compression algorithm will not fit all raw data distributions. Here we present SUMER (Search in Uncompressed Memory via Experience Replay), an end-to-end reinforcement learning agent with verifiable reward (RLVR) that learns to use search tools to gather information and answer a target question. On the LoCoMo dataset for long-context conversation understanding, SUMER with Qwen2.5-7B-Instruct learned to use search tools and outperformed all other biased memory compression approaches and also the full-context baseline, reaching SOTA performance (43% gain over the prior best). We demonstrate that a simple search method applied to raw data outperforms goal-agnostic and biased compression algorithms in current long-context memory tasks, arguing for new paradigms and benchmarks that are more dynamic and autonomously scalable. Code for SUMER and all implemented baselines is publicly available at https://github.com/zycyc/SUMER.

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

TL;DR

This work argues that long-context memory tasks benefit more from goal-directed search over uncompressed memory than from hand-engineered compression. It introduces SUMER, an RLVR-enabled agent trained with GRPO to search raw conversational logs via semantic/keyword tools and to submit answers when ready, avoiding lossy memory compression. On LoCoMo, SUMER with GRPO outperforms strong baselines and achieves state-of-the-art judge accuracy, demonstrating that dynamic search strategies can scale better than fixed memory schemas. The results highlight the potential of learnable search policies for lifelong agents and motivate new benchmarks that emphasize dynamic information retrieval over static compression.

Abstract

How to enable human-like long-term memory in large language models (LLMs) has been a central question for unlocking more general capabilities such as few-shot generalization. Existing memory frameworks and benchmarks focus on finding the optimal memory compression algorithm for higher performance in tasks that require recollection and sometimes further reasoning. However, such efforts have ended up building more human bias into the compression algorithm, through the search for the best prompts and memory architectures that suit specific benchmarks, rather than finding a general solution that would work on other data distributions. On the other hand, goal-directed search on uncompressed information could potentially exhibit superior performance because compression is lossy, and a predefined compression algorithm will not fit all raw data distributions. Here we present SUMER (Search in Uncompressed Memory via Experience Replay), an end-to-end reinforcement learning agent with verifiable reward (RLVR) that learns to use search tools to gather information and answer a target question. On the LoCoMo dataset for long-context conversation understanding, SUMER with Qwen2.5-7B-Instruct learned to use search tools and outperformed all other biased memory compression approaches and also the full-context baseline, reaching SOTA performance (43% gain over the prior best). We demonstrate that a simple search method applied to raw data outperforms goal-agnostic and biased compression algorithms in current long-context memory tasks, arguing for new paradigms and benchmarks that are more dynamic and autonomously scalable. Code for SUMER and all implemented baselines is publicly available at https://github.com/zycyc/SUMER.

Paper Structure

This paper contains 28 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Conversational memory vs. memory compression. (left) Long-horizon chats span many sessions with distractors. (right) Goal-agnostic memory compression applies Create, Read, Update, and Delete (CRUD) operations that can discard details later needed at query time, while our approach takes raw data as it is and directly adds it to the memory database for later search.
  • Figure 2: SUMER training loop with tool use and RLVR. The agent calls keyword/semantic search across multiple turns, then submits an answer for a verifiable reward. Retrieved tool tokens are visible as context but masked from the policy loss, so learning focuses on the agent’s outputs.
  • Figure 3: SUMER training curves.Left: Mean rewards during training. Right: Validation performance.
  • Figure 4: Comparison between ablations. SUMER outperformed all ablation conditions in F1, B1, J scores, and the number of turns to finish the task. Without temporal context of the retrieved memory, the no context variant requires more agentic search turns to gather enough information, and semantic search is more efficient than keyword search in general.
  • Figure B1: SUMER agent system prompt used during training to guide memory search and answer submission behavior.
  • ...and 3 more figures