Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

Yicong Zheng; Kevin L. McKee; Thomas Miconi; Zacharie Bugaud; Mick van Gelderen; Jed McCaleb

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

Yicong Zheng, Kevin L. McKee, Thomas Miconi, Zacharie Bugaud, Mick van Gelderen, Jed McCaleb

TL;DR

This work argues that long-context memory tasks benefit more from goal-directed search over uncompressed memory than from hand-engineered compression. It introduces SUMER, an RLVR-enabled agent trained with GRPO to search raw conversational logs via semantic/keyword tools and to submit answers when ready, avoiding lossy memory compression. On LoCoMo, SUMER with GRPO outperforms strong baselines and achieves state-of-the-art judge accuracy, demonstrating that dynamic search strategies can scale better than fixed memory schemas. The results highlight the potential of learnable search policies for lifelong agents and motivate new benchmarks that emphasize dynamic information retrieval over static compression.

Abstract

How to enable human-like long-term memory in large language models (LLMs) has been a central question for unlocking more general capabilities such as few-shot generalization. Existing memory frameworks and benchmarks focus on finding the optimal memory compression algorithm for higher performance in tasks that require recollection and sometimes further reasoning. However, such efforts have ended up building more human bias into the compression algorithm, through the search for the best prompts and memory architectures that suit specific benchmarks, rather than finding a general solution that would work on other data distributions. On the other hand, goal-directed search on uncompressed information could potentially exhibit superior performance because compression is lossy, and a predefined compression algorithm will not fit all raw data distributions. Here we present SUMER (Search in Uncompressed Memory via Experience Replay), an end-to-end reinforcement learning agent with verifiable reward (RLVR) that learns to use search tools to gather information and answer a target question. On the LoCoMo dataset for long-context conversation understanding, SUMER with Qwen2.5-7B-Instruct learned to use search tools and outperformed all other biased memory compression approaches and also the full-context baseline, reaching SOTA performance (43% gain over the prior best). We demonstrate that a simple search method applied to raw data outperforms goal-agnostic and biased compression algorithms in current long-context memory tasks, arguing for new paradigms and benchmarks that are more dynamic and autonomously scalable. Code for SUMER and all implemented baselines is publicly available at https://github.com/zycyc/SUMER.

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

TL;DR

Abstract

Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)