Table of Contents
Fetching ...

UniMem: Towards a Unified View of Long-Context Large Language Models

Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yankai Lin, Yukun Yan, Xiaodong Shi, Sen Song, Zhiyuan Liu, Maosong Sun

TL;DR

The paper tackles the bottleneck of long-context processing in LLMs by proposing UniMem, a memory-augmentation framework that formalizes four dimensions of memory manipulation. It reinterprets 16 existing long-context methods within this unified view and introduces UniMix, a synthesis that combines strengths from multiple dimensions. Empirical results across text and code datasets show UniMix achieves superior perplexity and robust performance, with insights on memory-layer placement and efficiency. The work provides a principled foundation for fair comparisons and scalable long-context modeling in LLMs.

Abstract

Long-context processing is a critical ability that constrains the applicability of large language models (LLMs). Although there exist various methods devoted to enhancing the long-context processing ability of LLMs, they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a Unified framework that reformulates existing long-context methods from the view of Memory augmentation of LLMs. Distinguished by its four core dimensions-Memory Management, Memory Writing, Memory Reading, and Memory Injection, UniMem empowers researchers to conduct systematic exploration of long-context methods. We re-formulate 16 existing methods based on UniMem and analyze four representative methods: Transformer-XL, Memorizing Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their design principles and strengths. Based on these analyses, we propose UniMix, an innovative approach that integrates the strengths of these algorithms. Experimental results show that UniMix achieves superior performance in handling long contexts with significantly lower perplexity than baselines.

UniMem: Towards a Unified View of Long-Context Large Language Models

TL;DR

The paper tackles the bottleneck of long-context processing in LLMs by proposing UniMem, a memory-augmentation framework that formalizes four dimensions of memory manipulation. It reinterprets 16 existing long-context methods within this unified view and introduces UniMix, a synthesis that combines strengths from multiple dimensions. Empirical results across text and code datasets show UniMix achieves superior perplexity and robust performance, with insights on memory-layer placement and efficiency. The work provides a principled foundation for fair comparisons and scalable long-context modeling in LLMs.

Abstract

Long-context processing is a critical ability that constrains the applicability of large language models (LLMs). Although there exist various methods devoted to enhancing the long-context processing ability of LLMs, they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a Unified framework that reformulates existing long-context methods from the view of Memory augmentation of LLMs. Distinguished by its four core dimensions-Memory Management, Memory Writing, Memory Reading, and Memory Injection, UniMem empowers researchers to conduct systematic exploration of long-context methods. We re-formulate 16 existing methods based on UniMem and analyze four representative methods: Transformer-XL, Memorizing Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their design principles and strengths. Based on these analyses, we propose UniMix, an innovative approach that integrates the strengths of these algorithms. Experimental results show that UniMix achieves superior performance in handling long contexts with significantly lower perplexity than baselines.
Paper Structure (29 sections, 17 equations, 13 figures, 4 tables)

This paper contains 29 sections, 17 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Diagram illustrates long-context methods (segment length $L = 3$). Yellow circles show past segments; blue circles mark the current segment. (a) Transformer-XL caches earlier hidden states. (b) Memorizing Transformer retrieves past segments with kNN similarity. (c) RMT employs memory tokens for prior segments. (d) Longformer extends segments with global and sliding window attention.
  • Figure 2: Attention patterns for long-context methods ($L=3$).(a) Transformer-XL. (b) Memorizing Transformer. (c) RMT. (d) Longformer.
  • Figure 3: Effects of different UniMem dimensions on perplexity across datasets. (a) Topk's role for MemTrans and UniMix; (b) Combined effects with Window Length; (c) Memory Layer Distribution's impact; (d) Memory Layer Position's influence (Single Layer Injection).
  • Figure 4: Impact of Overflow Handling on perplexity for Longformer, MemTrans and UniMix.
  • Figure 5: Impact of Compressed Tokens on perplexity for UniMix.
  • ...and 8 more figures