Table of Contents
Fetching ...

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang

TL;DR

TrimCaching addresses edge AI model caching by exploiting shared parameter blocks across models to improve storage efficiency and cache hits. It proves the general parameter-sharing placement problem is a submodular maximization with submodular constraints and NP-hard, offering a $(1-\epsilon)/2$-approximation algorithm for a practical special case via a successive greedy plus DP rounding, and a scalable greedy method for the general case. The approach yields significant cache-hit improvements over traditional independent caching, with strong empirical results across multi-edge networks and resilience to user mobility. This work provides a viable, theory-grounded pathway for efficient 6G edge intelligence by enabling parameter-sharing-aware model placement.

Abstract

Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-ε\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

TL;DR

TrimCaching addresses edge AI model caching by exploiting shared parameter blocks across models to improve storage efficiency and cache hits. It proves the general parameter-sharing placement problem is a submodular maximization with submodular constraints and NP-hard, offering a -approximation algorithm for a practical special case via a successive greedy plus DP rounding, and a scalable greedy method for the general case. The approach yields significant cache-hit improvements over traditional independent caching, with strong empirical results across multi-edge networks and resilience to user mobility. This work provides a viable, theory-grounded pathway for efficient 6G edge intelligence by enabling parameter-sharing-aware model placement.

Abstract

Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with -approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.
Paper Structure (20 sections, 7 theorems, 26 equations, 7 figures, 1 table, 3 algorithms)

This paper contains 20 sections, 7 theorems, 26 equations, 7 figures, 1 table, 3 algorithms.

Key Result

Proposition 1

${\mathcal{P}1.1}$ is a submodular maximization problem with $M$ submodular constraints.

Figures (7)

  • Figure 1: Inference accuracy v.s the number of bottom frozen layers of fine-tuned models. Based on an original model ResNet50 he2016deep pre-trained on CIFAR100 krizhevsky2009learning, we fine-tune it for two downstream tasks, i.e., "transportation" and "animal", respectively. The class "airplane", "automobile", "ship", and "truck" in CIFAR10 krizhevsky2009learning are summarized into a superclass "transportation", while the classes "bird", "cat", "deer", "dog", "frog", and "horse" are summarized into the superclass "animal". This implies downstream or personalized models can have a significant proportion of shared model parameters, given that fine-tuning techniques are widely adopted nowadays.
  • Figure 2: The TrimCaching framework in multi-edge scenario.
  • Figure 3: An example of the special case with a small fixed number of shared parameter blocks. In the figure, regardless of the scale of the model library, the shared green parameter blocks come from two pre-trained models. Nodes in other colors represent specific parameter blocks in the library.
  • Figure 4: Cache hit ratio for the special case, where a small fixed number of shared parameter blocks is considered. The error bar denotes the standard deviation, which is the same for the subsequent figures.
  • Figure 5: Cache hit ratio for the general case.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Theorem 1
  • proof
  • ...and 4 more