Fresh Caching of Dynamic Contents using Restless Multi-armed Bandits
Ankita Koley, Chandramani Singh
TL;DR
This work addresses dynamic content caching under updating contents by modeling the problem as a continuous-time RMAB with partial observability. By reformulating per-content dynamics as a semi-Markov decision process and proving indexability, the authors derive closed-form Whittle indices and implement a Whittle-index policy that caches the $M$ contents with the highest indices. The approach mitigates the curse of dimensionality and yields near-optimal performance compared to the relaxed RMAB benchmark, while outperforming prior cache policies in simulations. The results offer a principled, scalable framework for dynamic caching where content freshness and fetch costs must be balanced under hard cache constraints.
Abstract
We consider a dynamic content caching problem wherein the contents get updated at a central server, and local copies of a subset of contents are cached at a local cache associated with a Base station (BS). When a content request arrives, based on whether the content is in the local cache, the BS can decide whether to fetch the content from the central server or serve the cached version from the local cache. Fetching a content incurs a fixed fetching cost, and serving the cached version incurs an ageing cost proportional to the age-of-version (AoV) of the content. The BS has only partial information regarding AoVs of the contents. We formulate an optimal content fetching and caching problem to minimize the average cost subject to cache capacity constraints. The problem suffers from the curse of dimensionality and is provably hard to solve. We formulate this problem as a continuous time restless multi-armed bandit process (RMAB), where a single content problem of the corresponding RMAB is a partially observable Markov decision process. We reformulate the single content problem as a semi-Markov decision process, prove indexability, and provide a Whittle index based solution to this problem. Finally, we compare the performance with recent work and show that our proposed policy is optimal via simulations.
