Table of Contents
Fetching ...

AI-native Memory: A Pathway from LLMs Towards AGI

Jingbo Shang, Zai Zheng, Jiale Wei, Xiang Ying, Felix Tao, Mindverse Team

TL;DR

The paper argues that simply expanding LLM context is unlikely to yield AGI due to limited effective context and the difficulty of simultaneous retrieval and reasoning. It proposes AI-native memory as a core infrastructural shift, introducing Natural-language Memory (L1) and AI-Native Memory (L2) as memory layers embodied by Lifelong Personal Models (LPMs) that compress and organize experiences. An initial L2 LPM prototype is developed via LoRA fine-tuning and a me-focused prompting mechanism, with pilot studies showing superior performance against RAG baselines and long-context LLMs on a real-user benchmark. The work outlines a practical vision for memory-centric AGI, highlighting potential applications and addressing privacy and security considerations as the pathway toward proactive, personalized AI systems.

Abstract

Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective context length is significantly smaller than their claimed context length; and (2) Our reasoning-in-a-haystack experiments further demonstrate that simultaneously finding the relevant information from a long context and conducting (simple) reasoning is nearly impossible. In this paper, we envision a pathway from LLMs to AGI through the integration of \emph{memory}. We believe that AGI should be a system where LLMs serve as core processors. In addition to raw data, the memory in this system would store a large number of important conclusions derived from reasoning processes. Compared with retrieval-augmented generation (RAG) that merely processing raw data, this approach not only connects semantically related information closer, but also simplifies complex inferences at the time of querying. As an intermediate stage, the memory will likely be in the form of natural language descriptions, which can be directly consumed by users too. Ultimately, every agent/person should have its own large personal model, a deep neural network model (thus \emph{AI-native}) that parameterizes and compresses all types of memory, even the ones cannot be described by natural languages. Finally, we discuss the significant potential of AI-native memory as the transformative infrastructure for (proactive) engagement, personalization, distribution, and social in the AGI era, as well as the incurred privacy and security challenges with preliminary solutions.

AI-native Memory: A Pathway from LLMs Towards AGI

TL;DR

The paper argues that simply expanding LLM context is unlikely to yield AGI due to limited effective context and the difficulty of simultaneous retrieval and reasoning. It proposes AI-native memory as a core infrastructural shift, introducing Natural-language Memory (L1) and AI-Native Memory (L2) as memory layers embodied by Lifelong Personal Models (LPMs) that compress and organize experiences. An initial L2 LPM prototype is developed via LoRA fine-tuning and a me-focused prompting mechanism, with pilot studies showing superior performance against RAG baselines and long-context LLMs on a real-user benchmark. The work outlines a practical vision for memory-centric AGI, highlighting potential applications and addressing privacy and security considerations as the pathway toward proactive, personalized AI systems.

Abstract

Large language models (LLMs) have demonstrated the world with the sparks of artificial general intelligence (AGI). One opinion, especially from some startups working on LLMs, argues that an LLM with nearly unlimited context length can realize AGI. However, they might be too optimistic about the long-context capability of (existing) LLMs -- (1) Recent literature has shown that their effective context length is significantly smaller than their claimed context length; and (2) Our reasoning-in-a-haystack experiments further demonstrate that simultaneously finding the relevant information from a long context and conducting (simple) reasoning is nearly impossible. In this paper, we envision a pathway from LLMs to AGI through the integration of \emph{memory}. We believe that AGI should be a system where LLMs serve as core processors. In addition to raw data, the memory in this system would store a large number of important conclusions derived from reasoning processes. Compared with retrieval-augmented generation (RAG) that merely processing raw data, this approach not only connects semantically related information closer, but also simplifies complex inferences at the time of querying. As an intermediate stage, the memory will likely be in the form of natural language descriptions, which can be directly consumed by users too. Ultimately, every agent/person should have its own large personal model, a deep neural network model (thus \emph{AI-native}) that parameterizes and compresses all types of memory, even the ones cannot be described by natural languages. Finally, we discuss the significant potential of AI-native memory as the transformative infrastructure for (proactive) engagement, personalization, distribution, and social in the AGI era, as well as the incurred privacy and security challenges with preliminary solutions.

Paper Structure

This paper contains 26 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: An Overview of Reasoning-in-a-Haystack. In this paper, the haystack, needles, and queries are all designed based on the real data and scenarios from Mebot of Mindverse AI, under the user permission. The haystack is typically a series of User-Mebot interactions chained chronologically. The needle-query pairs are constructed for certain recommendation scenarios.
  • Figure 2: Reasoning-in-a-haystack Comparison based on the Mebot's Real Data w.r.t. Different Context Lengths and Hop Counts. The multi-needle setting distributes different needles uniformly in the haystack, and the single-needle setting merges all the needles together and injects them at either depth 40% or 60%. The scores are averaged across different runs. GPT-3.5-turbo cannot be applied to too long context lengths. Detailed results are illustrated in Figure \ref{['fig:experiment_result_with_number']}.
  • Figure 3: Example Query-Needle Pair and its True Answer.
  • Figure 4: System Prompt used in the Provider LLM.
  • Figure 5: Criteria used in the Evaluator LLM during Scoring.
  • ...and 1 more figures