Table of Contents
Fetching ...

Text2Mem: A Unified Memory Operation Language for Memory Operating System

Yi Wang, Lihai Yang, Boyu Chen, Gongyi Zou, Kerun Xu, Bo Tang, Feiyu Xiong, Siheng Chen, Zhiyu Li

TL;DR

Text2Mem tackles fragmentation in memory control for LLM agents by introducing a unified, schema-based memory operation language. It defines twelve verbs spanning encoding, storage, and retrieval, governed by a compact five-field operation schema and executed through a validator–parser–adapter pipeline, with backends including a SQL prototype and real memory frameworks. The approach emphasizes explicit semantics, safety invariants, and cross-backend portability, demonstrated through illustrative workflows such as semantic promotion and incident postmortems. Additionally, Text2Mem Bench provides an end-to-end, reproducible benchmark that separates planning from execution to evaluate both schema generation and actual memory effects. Together, the work establishes a formal foundation for reliable, auditable memory control in long-horizon agents and paves the way for reproducible research across backends.

Abstract

Large language model agents increasingly depend on memory to sustain long horizon interaction, but existing frameworks remain limited. Most expose only a few basic primitives such as encode, retrieve, and delete, while higher order operations like merge, promote, demote, split, lock, and expire are missing or inconsistently supported. Moreover, there is no formal and executable specification for memory commands, leaving scope and lifecycle rules implicit and causing unpredictable behavior across systems. We introduce Text2Mem, a unified memory operation language that provides a standardized pathway from natural language to reliable execution. Text2Mem defines a compact yet expressive operation set aligned with encoding, storage, and retrieval. Each instruction is represented as a JSON based schema instance with required fields and semantic invariants, which a parser transforms into typed operation objects with normalized parameters. A validator ensures correctness before execution, while adapters map typed objects either to a SQL prototype backend or to real memory frameworks. Model based services such as embeddings or summarization are integrated when required. All results are returned through a unified execution contract. This design ensures safety, determinism, and portability across heterogeneous backends. We also outline Text2Mem Bench, a planned benchmark that separates schema generation from backend execution to enable systematic evaluation. Together, these components establish the first standardized foundation for memory control in agents.

Text2Mem: A Unified Memory Operation Language for Memory Operating System

TL;DR

Text2Mem tackles fragmentation in memory control for LLM agents by introducing a unified, schema-based memory operation language. It defines twelve verbs spanning encoding, storage, and retrieval, governed by a compact five-field operation schema and executed through a validator–parser–adapter pipeline, with backends including a SQL prototype and real memory frameworks. The approach emphasizes explicit semantics, safety invariants, and cross-backend portability, demonstrated through illustrative workflows such as semantic promotion and incident postmortems. Additionally, Text2Mem Bench provides an end-to-end, reproducible benchmark that separates planning from execution to evaluate both schema generation and actual memory effects. Together, the work establishes a formal foundation for reliable, auditable memory control in long-horizon agents and paves the way for reproducible research across backends.

Abstract

Large language model agents increasingly depend on memory to sustain long horizon interaction, but existing frameworks remain limited. Most expose only a few basic primitives such as encode, retrieve, and delete, while higher order operations like merge, promote, demote, split, lock, and expire are missing or inconsistently supported. Moreover, there is no formal and executable specification for memory commands, leaving scope and lifecycle rules implicit and causing unpredictable behavior across systems. We introduce Text2Mem, a unified memory operation language that provides a standardized pathway from natural language to reliable execution. Text2Mem defines a compact yet expressive operation set aligned with encoding, storage, and retrieval. Each instruction is represented as a JSON based schema instance with required fields and semantic invariants, which a parser transforms into typed operation objects with normalized parameters. A validator ensures correctness before execution, while adapters map typed objects either to a SQL prototype backend or to real memory frameworks. Model based services such as embeddings or summarization are integrated when required. All results are returned through a unified execution contract. This design ensures safety, determinism, and portability across heterogeneous backends. We also outline Text2Mem Bench, a planned benchmark that separates schema generation from backend execution to enable systematic evaluation. Together, these components establish the first standardized foundation for memory control in agents.

Paper Structure

This paper contains 49 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Ambiguity and inconsistency in current systems versus Text2Mem’s formalized handling. Left: The natural language instruction "I'd rather not bring up anything about lunch for now" is underspecified. Scope, action, and duration are unclear, leading to inconsistent behaviors across agents (temporary suppression, hard deletion, or ignoring). Right: Text2Mem resolves the ambiguity by instantiating a schema-based Demote operation with explicit arguments. The validator, parser, and adapter guarantee consistent execution across heterogeneous backends.
  • Figure 2: Illustration of the Text2Mem execution pathway. Natural language instructions are normalized into memory operation schema instances, which are validated, parsed into typed operation objects, and finally executed through adapters to real memory frameworks or, alternatively, through a SQL-based prototype backend for controlled verification.
  • Figure 3: Overview of the Text2Mem Benchmark pipeline. The process consists of three generation stages and two evaluation layers. Step 1: Scenarios are converted into realistic natural-language requests. Step 2: The requests are translated into executable memory operation schemas (schema_list) with corresponding prerequisites. Step 3: Expected outcomes are defined for automatic verification after execution. Two evaluation layers assess system performance: Plan-level evaluation measures string-match accuracy and execution success rate of generated schemas; Execution-level evaluation measures expectation match rate and retrieval-based metrics to quantify behavioral correctness after running the schema_list in the Text2Mem system.