Table of Contents
Fetching ...

Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services

Zhang Liu, Hongyang Du, Xiangwang Hou, Lianfen Huang, Seyyedali Hosseinalipour, Dusit Niyato, Khaled Ben Letaief

TL;DR

This paper addresses challenges of edge-enabled AIGC service provisioning, which remain underexplored in the literature, and introduces the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.

Abstract

Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services. In this paper, we address challenges of edge-enabled AIGC service provisioning, which remain underexplored in the literature. These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge. We subsequently introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics. We obtain mathematical relationships of these metrics with the computational resources required by GenAI models via experimentation. Afterward, we decompose the formulation into a model caching subproblem on a long-timescale and a resource allocation subproblem on a short-timescale. Since the variables to be solved are discrete and continuous, respectively, we leverage a double deep Q-network (DDQN) algorithm to solve the former subproblem and propose a diffusion-based deep deterministic policy gradient (D3PG) algorithm to solve the latter. The proposed D3PG algorithm makes an innovative use of diffusion models as the actor network to determine optimal resource allocation decisions. Consequently, we integrate these two learning methods within the overarching two-timescale deep reinforcement learning (T2DRL) algorithm, the performance of which is studied through comparative numerical simulations.

Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services

TL;DR

This paper addresses challenges of edge-enabled AIGC service provisioning, which remain underexplored in the literature, and introduces the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.

Abstract

Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services. In this paper, we address challenges of edge-enabled AIGC service provisioning, which remain underexplored in the literature. These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge. We subsequently introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics. We obtain mathematical relationships of these metrics with the computational resources required by GenAI models via experimentation. Afterward, we decompose the formulation into a model caching subproblem on a long-timescale and a resource allocation subproblem on a short-timescale. Since the variables to be solved are discrete and continuous, respectively, we leverage a double deep Q-network (DDQN) algorithm to solve the former subproblem and propose a diffusion-based deep deterministic policy gradient (D3PG) algorithm to solve the latter. The proposed D3PG algorithm makes an innovative use of diffusion models as the actor network to determine optimal resource allocation decisions. Consequently, we integrate these two learning methods within the overarching two-timescale deep reinforcement learning (T2DRL) algorithm, the performance of which is studied through comparative numerical simulations.

Paper Structure

This paper contains 44 sections, 22 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: An example of the GenAI model RePaint, trained on different datasets, used to repair the same corrupted image.
  • Figure 2: A schematic of the user-edge-cloud orchestrated architecture for provisioning AIGC services.
  • Figure 3: An example of an edge-enabled AIGC service for restoring corrupted images of human faces.
  • Figure 4: An illustration of the diffusion model tailored to generate optimal decisions for communication and computing resource allocation at time slot $k$.
  • Figure 5: Overall flowchart of the proposed T2DRL algorithm, consisting of the DDQN algorithm operating on the long-timescale and the D3PG algorithm operating on the short-timescale.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2