TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks
Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang
TL;DR
TrimCaching addresses edge AI model caching by exploiting shared parameter blocks across models to improve storage efficiency and cache hits. It proves the general parameter-sharing placement problem is a submodular maximization with submodular constraints and NP-hard, offering a $(1-\epsilon)/2$-approximation algorithm for a practical special case via a successive greedy plus DP rounding, and a scalable greedy method for the general case. The approach yields significant cache-hit improvements over traditional independent caching, with strong empirical results across multi-edge networks and resilience to user mobility. This work provides a viable, theory-grounded pathway for efficient 6G edge intelligence by enabling parameter-sharing-aware model placement.
Abstract
Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-ε\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.
