Table of Contents
Fetching ...

Making Array-Based Translation Practical for Modern, High-Performance Buffer Management

Xinjing Zhou, Jinming Hu, Andrew Pavlo, Michael Stonebraker

Abstract

Modern buffer pools must now support a broader workload mix than classic OLTP alone. In addition to B-tree lookups, database systems increasingly serve scan-heavy analytics and vector-search indexes with irregular high-fan-out graph traversal access patterns. These workloads require a translation mechanism -- mapping logical page IDs to resident frames -- that is simultaneously fast across these diverse access patterns, deployable in user space,compatible with huge pages, easy to integrate, and still under DBMS control for eviction and I/O. Existing designs satisfy only subsets of these goals. This paper presents \textbf{\calico}, a practical DBMS-controlled buffer pool built around array-based translation, a decades-old-idea that was dissmissed but now viable with modern hardware. \calico decouples logical translation from OS page tables so that the DBMS can combine low-overhead translation with huge-page-backed frames and fine-grained page management. To make array translation practical and performant for DBMSes with large sparse hierarchical page identifiers, \calico introduces three techniques: multi-level translation with path caching, hole punching for reclaiming cold translation memory, and group prefetch to exploit parallelism. Our evaluation across scans, OLTP-style B-tree accesses, and vector search shows that \calico matches or outperforms the existing state-of-the-art in-memory and out-of-memory performance. We also implement \calico as a drop-in replacement for PostgreSQL's buffer manager and integrate it with \texttt{pgvector}. Across vector search, and scan-heavy workloads, \calico delivers up to 3.9$\times$ in-memory and 6.5$\times$ larger-than-memory speedup for PostgreSQL vector search, speeds up scan-heavy queries by up to 3$\times$.

Making Array-Based Translation Practical for Modern, High-Performance Buffer Management

Abstract

Modern buffer pools must now support a broader workload mix than classic OLTP alone. In addition to B-tree lookups, database systems increasingly serve scan-heavy analytics and vector-search indexes with irregular high-fan-out graph traversal access patterns. These workloads require a translation mechanism -- mapping logical page IDs to resident frames -- that is simultaneously fast across these diverse access patterns, deployable in user space,compatible with huge pages, easy to integrate, and still under DBMS control for eviction and I/O. Existing designs satisfy only subsets of these goals. This paper presents \textbf{\calico}, a practical DBMS-controlled buffer pool built around array-based translation, a decades-old-idea that was dissmissed but now viable with modern hardware. \calico decouples logical translation from OS page tables so that the DBMS can combine low-overhead translation with huge-page-backed frames and fine-grained page management. To make array translation practical and performant for DBMSes with large sparse hierarchical page identifiers, \calico introduces three techniques: multi-level translation with path caching, hole punching for reclaiming cold translation memory, and group prefetch to exploit parallelism. Our evaluation across scans, OLTP-style B-tree accesses, and vector search shows that \calico matches or outperforms the existing state-of-the-art in-memory and out-of-memory performance. We also implement \calico as a drop-in replacement for PostgreSQL's buffer manager and integrate it with \texttt{pgvector}. Across vector search, and scan-heavy workloads, \calico delivers up to 3.9 in-memory and 6.5 larger-than-memory speedup for PostgreSQL vector search, speeds up scan-heavy queries by up to 3.

Paper Structure

This paper contains 31 sections, 12 figures, 6 tables, 4 algorithms.

Figures (12)

  • Figure 1: Translation Overhead Across Workloads -- (a) Sequential scan: Hash tables destroy spatial locality, causing 6.9$\times$ slowdown vs array/vmcache; predicache provides no benefit for sequential access. (b) Random range scan: Gap persists at 2.0$\times$ despite random access; predicache underperforms plain hash tables. (c) B-tree lookup: Hash tables cause 1.56$\times$ slowdown; predicache matches array by overlapping hash table access with buffer frame access through CPU speculative execution. (d) Graph BFS: Translation serialization limits memory-level parallelism, causing 3.4$\times$ slowdown; predicache helps partially (1.5$\times$) but irregular traversal limits speculation accuracy. Array-based translation matches or outperforms vmcache with 4KB pages while retaining fine-grained I/O.
  • Figure 2: Calico Buffer Manager Architecture -- Calico separates DBMS-managed logical translation control from OS-managed physical backing. The upper-level mapping table resolves page-ID prefixes to last-level translation arrays. Each 64-bit translation entry encodes frame ID, version, and latch state. Frame memory is huge-page-backed for TLB efficiency. The hole-punching reference-count array tracks groups of translation entries so cold regions of the translation array can be reclaimed without affecting frame-memory mappings. For readability, the figure shows 4-entry groups (32 bytes); in practice, groups are typically one OS page of translation entries (4KB).
  • Figure 3: Step-by-Step Hierarchical Translation with Path Caching -- Calico decomposes each page identifier into a prefix and a suffix. The figure walks through four steps: (1) check the translation path cache; (2) on a miss, resolve the prefix through an upper-level index (e.g., radix tree, hash table, B$^+$-tree, or trie) to obtain a last-level translation array; (3) use the suffix to directly index that array on the hot path; and (4) update the path cache with the resolved prefix-to-array mapping.
  • Figure 4: Vector Search (In-Memory) -- Measured throughput of the HNSW index on DEEP10M and SIFT10M when the entire data set fits in memory using 1--64 threads.
  • Figure 5: Vector Search (Larger-than-Memory) -- Throughput comparison across buffer managers with varying memory budgets at 64 threads. Calico maintains superior performance, outperforming vmcache and USearch under memory pressure by 2.11$\times$ and 5.99$\times$, respectively.
  • ...and 7 more figures