Pancake: Hierarchical Memory System for Multi-Agent LLM Serving

Zhengding Hu; Zaifeng Pan; Prabhleen Kaur; Vibha Murthy; Zhongkai Yu; Yue Guan; Zhen Wang; Steven Swanson; Yufei Ding

Pancake: Hierarchical Memory System for Multi-Agent LLM Serving

Zhengding Hu, Zaifeng Pan, Prabhleen Kaur, Vibha Murthy, Zhongkai Yu, Yue Guan, Zhen Wang, Steven Swanson, Yufei Ding

TL;DR

Pancake is presented, a multi-tier agentic memory system that unifies three key techniques: multi-level index caching for single agents, coordinated index management across multiple agents, and collaborative GPU-CPU acceleration.

Abstract

In this work, we identify and address the core challenges of agentic memory management in LLM serving, where large-scale storage, frequent updates, and multiple coexisting agents jointly introduce complex and high-cost approximate nearest neighbor (ANN) searching problems. We present Pancake, a multi-tier agentic memory system that unifies three key techniques: (i) multi-level index caching for single agents, (ii) coordinated index management across multiple agents, and (iii) collaborative GPU-CPU acceleration. Pancake exposes easy-to-use interface that can be integrated into memory-based agents like Mem-GPT, and is compatible with agentic frameworks such as LangChain and LlamaIndex. Experiments on realistic agent workloads show that Pancake substantially outperforms existing frameworks, achieving more than 4.29x end-to-end throughput improvement.

Pancake: Hierarchical Memory System for Multi-Agent LLM Serving

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 19 figures)

This paper contains 20 sections, 3 equations, 19 figures.

Introduction
Background and Related Work
Memory-based Agent and ANN
Dynamic Vector Database
Motivation
Inefficient Update Strategy for Single-Agent Memory Access
Challenges for Multi-Agent Memory Index Management
Difficulties for GPU-CPU Collaboration
Pancake: Methods and System Design
Overview
Pattern-Driven Multi-Level Index Cache
Multi-Agent Indexing with Hybrid Graph
Dynamic GPU-CPU Index Coordination
Implementation
Evaluation
...and 5 more sections

Figures (19)

Figure 1: Memory-based workflow of agentic LLMs.
Figure 2: An example of multi-agent memory in Pancake.
Figure 3: Memory-based agents and their workflows.
Figure 4: Direct in-place updates scatter the new vectors into a large number of existing clusters, leading to degradation in efficiency and recall. A naive solution is to leverage intra-agent locality and maintain dedicated clusters for a agent.
Figure 5: For more complex workloads, locality across multiple reasoning steps of different requests can be observed. This makes naive dedicated clusters for the agent inefficient, as it fails to capture step-wise clustering.
...and 14 more figures

Pancake: Hierarchical Memory System for Multi-Agent LLM Serving

TL;DR

Abstract

Pancake: Hierarchical Memory System for Multi-Agent LLM Serving

Authors

TL;DR

Abstract

Table of Contents

Figures (19)