Table of Contents
Fetching ...

TokMem: Tokenized Procedural Memory for Large Language Models

Zijun Wu, Yongchang Hao, Lili Mou

TL;DR

TokMem tackles the inefficiency and scaling issues of prompt-based prompting by storing recurring procedures as compact, trainable memory tokens that are fed into generation while the backbone is kept frozen. By enabling both addressing and steering signals, TokMem supports multi-step, compositional recall with constant overhead and enables continual expansion without catastrophic forgetting. Across atomic recall on 1,000 SNI tasks and compositional tool use on APIGen, TokMem outperforms retrieval-based baselines and parametric fine-tuning while using far fewer trainable parameters, and it demonstrates strong memory routing and generalization capabilities. This approach offers a scalable, modular alternative to prompt engineering and traditional fine-tuning, providing an explicit procedural memory mechanism for large language models with practical implications for continual learning and tool-assisted reasoning.

Abstract

Large language models rely heavily on prompts to specify tasks, recall knowledge and guide reasoning. However, this reliance is inefficient as prompts must be re-read at each step, scale poorly across tasks, and lack mechanisms for modular reuse. We introduce TokMem, a tokenized procedural memory that stores recurring procedures as compact, trainable embeddings. Each memory token encodes both an address to a procedure and a control signal that steers generation, enabling targeted behavior with constant-size overhead. To support continual adaptation, TokMem keeps the backbone model frozen, allowing new procedures to be added without interfering with existing ones. We evaluate TokMem on 1,000 tasks for atomic recall, and on function-calling tasks for compositional recall, where it consistently outperforms retrieval-augmented generation while avoiding repeated context overhead, and fine-tuning with far fewer parameters. These results establish TokMem as a scalable and modular alternative to prompt engineering and fine-tuning, offering an explicit procedural memory for LLMs.

TokMem: Tokenized Procedural Memory for Large Language Models

TL;DR

TokMem tackles the inefficiency and scaling issues of prompt-based prompting by storing recurring procedures as compact, trainable memory tokens that are fed into generation while the backbone is kept frozen. By enabling both addressing and steering signals, TokMem supports multi-step, compositional recall with constant overhead and enables continual expansion without catastrophic forgetting. Across atomic recall on 1,000 SNI tasks and compositional tool use on APIGen, TokMem outperforms retrieval-based baselines and parametric fine-tuning while using far fewer trainable parameters, and it demonstrates strong memory routing and generalization capabilities. This approach offers a scalable, modular alternative to prompt engineering and traditional fine-tuning, providing an explicit procedural memory mechanism for large language models with practical implications for continual learning and tool-assisted reasoning.

Abstract

Large language models rely heavily on prompts to specify tasks, recall knowledge and guide reasoning. However, this reliance is inefficient as prompts must be re-read at each step, scale poorly across tasks, and lack mechanisms for modular reuse. We introduce TokMem, a tokenized procedural memory that stores recurring procedures as compact, trainable embeddings. Each memory token encodes both an address to a procedure and a control signal that steers generation, enabling targeted behavior with constant-size overhead. To support continual adaptation, TokMem keeps the backbone model frozen, allowing new procedures to be added without interfering with existing ones. We evaluate TokMem on 1,000 tasks for atomic recall, and on function-calling tasks for compositional recall, where it consistently outperforms retrieval-augmented generation while avoiding repeated context overhead, and fine-tuning with far fewer parameters. These results establish TokMem as a scalable and modular alternative to prompt engineering and fine-tuning, offering an explicit procedural memory for LLMs.

Paper Structure

This paper contains 36 sections, 9 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of TokMem. (a) New memory (colored) tokens are interleaved with text sequences, learning with next-token-prediction while the LLM backbone remains frozen. (b) An example of inference, a query recalls and chains memory tokens (parse, search, format), enabling multi-step procedural behavior without long prompts.
  • Figure 2: Sample efficiency on a 10-task mixture from SNI. TokMem consistently outperforms fine-tuning in the low-data regime. TokMem can surpass RAG with only $10$ training samples, demonstrating strong few-shot learning capability.
  • Figure 3: Forgetting analysis in continual adaptation. As new tools are introduced, fine-tuning with replay memory suffers sharp drops on earlier tasks, while TokMem maintains stable performance. Larger models show stronger retention due to greater capacity.
  • Figure 4: Effect of renormalization on TokMem. Without renormalization, new tokens dominate and older ones are forgotten, particularly in smaller models with limited embedding capacity. Renormalization effectively mitigates this by balancing norms across tokens.
  • Figure 5: Overview of Decoupled TokMem embeddings, which learns separate memory matrices for address of memories and generation steering.
  • ...and 2 more figures