Latency Guarantees for Caching with Delayed Hits
Keerthana Gurushankar, Noah G. Singer, Bernardo Subercaseaux
TL;DR
This work analyzes caching with delayed hits, where misses incur latency governed by the fetch delay ratio $Z$ relative to inter-request time. It formulates $(Z,k)$-DelayedHitsCaching as a finite-state machine over $n$ pages with a $k$-page cache and analyzes online policies, proving a tight $Theta(Zk)$ competitiveness bound for deterministic marking-based policies, including LRU. The main contribution is a phase and superphase decomposition that upper-bounds online latency while lower-bounding the offline optimum, yielding a provable worst-case guarantee and guiding practical caching strategies. Empirical evaluation on 18 datasets shows that actual competitive ratios are typically far below the theoretical bound, and that latency scales with $Z$ in practice.
Abstract
In the classical caching problem, when a requested page is not present in the cache (i.e., a "miss"), it is assumed to travel from the backing store into the cache "before" the next request arrives. However, in many real-life applications, such as content delivery networks, this assumption is unrealistic. The "delayed-hits" model for caching, introduced by Atre, Sherry, Wang, and Berger, accounts for the latency between a missed cache request and the corresponding arrival from the backing store. This theoretical model has two parameters: the "delay" $Z$, representing the ratio between the retrieval delay and the inter-request delay in an application, and the "cache size" $k$, as in classical caching. Classical caching corresponds to $Z=1$, whereas larger values of $Z$ model applications where retrieving missed requests is expensive. Despite the practical relevance of the delayed-hits model, its theoretical underpinnings are still poorly understood. We present the first tight theoretical guarantee for optimizing delayed-hits caching: The "Least Recently Used" algorithm, a natural, deterministic, online algorithm widely used in practice, is $O(Zk)$-competitive, meaning it incurs at most $O(Zk)$ times more latency than the (offline) optimal schedule. Our result extends to any so-called "marking" algorithm.
