Table of Contents
Fetching ...

MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, Junpeng Ren, Zehao Lin, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhiqiang Yin, Qingchen Yu, Bo Tang, Hongkang Yang, Zhi-Qin John Xu, Feiyu Xiong

TL;DR

MemOS proposes a memory-centric operating system for LLMs that elevates memory to a first-class resource, unifying parametric, activation, and plaintext memory through the MemCube abstraction. It outlines a three-layer architecture and a closed-loop memory I/O path to enable scheduling, governance, and lifecycle management across heterogeneous memory types. The work articulates concrete components (MemScheduler, MemLifecycle, MemGovernance, MemVault, MemStore) and transformative transformation pathways between memory types to support continual adaptation and cross-platform collaboration. This approach aims to overcome current memory silos, enable personalized and persistent knowledge, and catalyze future multi-agent, multi-model intelligence with memory-driven evolution.

Abstract

Large Language Models (LLMs) have emerged as foundational infrastructure in the pursuit of Artificial General Intelligence (AGI). Despite their remarkable capabilities in language perception and generation, current LLMs fundamentally lack a unified and structured architecture for handling memory. They primarily rely on parametric memory (knowledge encoded in model weights) and ephemeral activation memory (context-limited runtime states). While emerging methods like Retrieval-Augmented Generation (RAG) incorporate plaintext memory, they lack lifecycle management and multi-modal integration, limiting their capacity for long-term knowledge evolution. To address this, we introduce MemOS, a memory operating system designed for LLMs that, for the first time, elevates memory to a first-class operational resource. It builds unified mechanisms for representation, organization, and governance across three core memory types: parametric, activation, and plaintext. At its core is the MemCube, a standardized memory abstraction that enables tracking, fusion, and migration of heterogeneous memory, while offering structured, traceable access across tasks and contexts. MemOS establishes a memory-centric execution framework with strong controllability, adaptability, and evolvability. It fills a critical gap in current LLM infrastructure and lays the groundwork for continual adaptation, personalized intelligence, and cross-platform coordination in next-generation intelligent systems.

MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

TL;DR

MemOS proposes a memory-centric operating system for LLMs that elevates memory to a first-class resource, unifying parametric, activation, and plaintext memory through the MemCube abstraction. It outlines a three-layer architecture and a closed-loop memory I/O path to enable scheduling, governance, and lifecycle management across heterogeneous memory types. The work articulates concrete components (MemScheduler, MemLifecycle, MemGovernance, MemVault, MemStore) and transformative transformation pathways between memory types to support continual adaptation and cross-platform collaboration. This approach aims to overcome current memory silos, enable personalized and persistent knowledge, and catalyze future multi-agent, multi-model intelligence with memory-driven evolution.

Abstract

Large Language Models (LLMs) have emerged as foundational infrastructure in the pursuit of Artificial General Intelligence (AGI). Despite their remarkable capabilities in language perception and generation, current LLMs fundamentally lack a unified and structured architecture for handling memory. They primarily rely on parametric memory (knowledge encoded in model weights) and ephemeral activation memory (context-limited runtime states). While emerging methods like Retrieval-Augmented Generation (RAG) incorporate plaintext memory, they lack lifecycle management and multi-modal integration, limiting their capacity for long-term knowledge evolution. To address this, we introduce MemOS, a memory operating system designed for LLMs that, for the first time, elevates memory to a first-class operational resource. It builds unified mechanisms for representation, organization, and governance across three core memory types: parametric, activation, and plaintext. At its core is the MemCube, a standardized memory abstraction that enables tracking, fusion, and migration of heterogeneous memory, while offering structured, traceable access across tasks and contexts. MemOS establishes a memory-centric execution framework with strong controllability, adaptability, and evolvability. It fills a critical gap in current LLM infrastructure and lays the groundwork for continual adaptation, personalized intelligence, and cross-platform coordination in next-generation intelligent systems.

Paper Structure

This paper contains 15 sections, 6 figures.

Figures (6)

  • Figure 1: Memory (Mem) in LLMs.
  • Figure 2: The next leap in model capability evolution hinges on the introduction of memory systems, marking a paradigm shift toward "memory training".
  • Figure 3: Transformation paths among three types of memory, forming a unified, controllable, and evolvable memory space.
  • Figure 4: MemCube: a unified abstraction for heterogeneous memory, comprising a metadata header and semantic payload—serving as the smallest execution unit of memory in MemOS.
  • Figure 5: Overview of the MemOS architecture: showing the end-to-end memory lifecycle from user input to API parsing, scheduling, activation, governance, and evolution—unified via MemCube.
  • ...and 1 more figures