GrASP: A Generalizable Address-based Semantic Prefetcher for Scalable Transactional and Analytical Workloads
Farzaneh Zirak, Farhana Choudhury, Renata Borovica-Gajic
TL;DR
GrASP addresses the challenge of scalable, accurate prefetching for both analytical and transactional workloads by fusing LBA delta modeling with semantic context. It frames prefetching as contextual multi-label classification and uses an LSTM-based predictor trained on embedded semantic blocks plus a table-based LBA abstraction, enabling generalization to datasets up to 250× larger. Key contributions include a table-based LBA scheme, order-agnostic delta computation, plan-agnostic query representations, IPCA-based block encodings, and a tunable delta repertoire for fast adaptation. Empirical results show GrASP achieving about 91.4% hit ratio, 90.8% I/O time reduction, and 57.1% latency reduction on average, with up to 45% higher hit rate and up to 60% I/O time improvements over baselines in analytical and transactional settings, and strong generalization without extensive retraining.
Abstract
Data prefetching--loading data into the cache before it is requested--is essential for reducing I/O overhead and improving database performance. While traditional prefetchers focus on sequential patterns, recent learning-based approaches, especially those leveraging data semantics, achieve higher accuracy for complex access patterns. However, these methods often struggle with today's dynamic, ever-growing datasets and require frequent, timely fine-tuning. Privacy constraints may also restrict access to complete datasets, necessitating prefetchers that can learn effectively from samples. To address these challenges, we present GrASP, a learning-based prefetcher designed for both analytical and transactional workloads. GrASP enhances prefetching accuracy and scalability by leveraging logical block address deltas and combining query representations with result encodings. It frames prefetching as a context-aware multi-label classification task, using multi-layer LSTMs to predict delta patterns from embedded context. This delta modeling approach enables GrASP to generalize predictions from small samples to larger, dynamic datasets without requiring extensive retraining. Experiments on real-world datasets and industrial benchmarks demonstrate that GrASP generalizes to datasets 250 times larger than the training data, achieving up to 45% higher hit ratios, 60% lower I/O time, and 55% lower end-to-end query execution latency than existing baselines. On average, GrASP attains a 91.4% hit ratio, a 90.8% I/O time reduction, and a 57.1% execution latency reduction.
