Table of Contents
Fetching ...

Accelerating AIGC Services with Latent Action Diffusion Scheduling in Edge Networks

Changfu Xu, Jianxiong Guo, Wanyu Lin, Haodong Zou, Wentao Fan, Tian Wang, Xiaowen Chu, Jiannong Cao

TL;DR

This work tackles the latency challenges of AIGC services in distributed edge networks by formulating the offloading problem as an online INLP and proving its NP-hardness offline. It introduces LAD-TS, a diffusion-guided scheduling framework that blends latent action diffusion networks with soft actor-critic reinforcement learning to produce near-optimal, online offloading decisions, complemented by a latent-action diffusion strategy that leverages historical action probabilities for fast convergence. An online distributed algorithm with linear per-slot complexity demonstrates strong delay reductions and training efficiency, and the DEdgeAI prototype shows practical gains on real edge hardware, including substantial memory savings via reSD3-m. Together, the methods deliver meaningful QoE improvements for AIGC at the edge and provide a scalable blueprint for deploying diffusion-assisted DRL in resource-constrained environments.

Abstract

Artificial Intelligence Generated Content (AIGC) has gained significant popularity for creating diverse content. Current AIGC models primarily focus on content quality within a centralized framework, resulting in a high service delay and negative user experiences. However, not only does the workload of an AIGC task depend on the AIGC model's complexity rather than the amount of data, but the large model and its multi-layer encoder structure also result in a huge demand for computational and memory resources. These unique characteristics pose new challenges in its modeling, deployment, and scheduling at edge networks. Thus, we model an offloading problem among edges for providing real AIGC services and propose LAD-TS, a novel Latent Action Diffusion-based Task Scheduling method that orchestrates multiple edge servers for expedited AIGC services. The LAD-TS generates a near-optimal offloading decision by leveraging the diffusion model's conditional generation capability and the reinforcement learning's environment interaction ability, thereby minimizing the service delays under multiple resource constraints. Meanwhile, a latent action diffusion strategy is designed to guide decision generation by utilizing historical action probability, enabling rapid achievement of near-optimal decisions. Furthermore, we develop DEdgeAI, a prototype edge system with a refined AIGC model deployment to implement and evaluate our LAD-TS method. DEdgeAI provides a real AIGC service for users, demonstrating up to 29.18% shorter service delays than the current five representative AIGC platforms. We release our open-source code at https://github.com/ChangfuXu/DEdgeAI/.

Accelerating AIGC Services with Latent Action Diffusion Scheduling in Edge Networks

TL;DR

This work tackles the latency challenges of AIGC services in distributed edge networks by formulating the offloading problem as an online INLP and proving its NP-hardness offline. It introduces LAD-TS, a diffusion-guided scheduling framework that blends latent action diffusion networks with soft actor-critic reinforcement learning to produce near-optimal, online offloading decisions, complemented by a latent-action diffusion strategy that leverages historical action probabilities for fast convergence. An online distributed algorithm with linear per-slot complexity demonstrates strong delay reductions and training efficiency, and the DEdgeAI prototype shows practical gains on real edge hardware, including substantial memory savings via reSD3-m. Together, the methods deliver meaningful QoE improvements for AIGC at the edge and provide a scalable blueprint for deploying diffusion-assisted DRL in resource-constrained environments.

Abstract

Artificial Intelligence Generated Content (AIGC) has gained significant popularity for creating diverse content. Current AIGC models primarily focus on content quality within a centralized framework, resulting in a high service delay and negative user experiences. However, not only does the workload of an AIGC task depend on the AIGC model's complexity rather than the amount of data, but the large model and its multi-layer encoder structure also result in a huge demand for computational and memory resources. These unique characteristics pose new challenges in its modeling, deployment, and scheduling at edge networks. Thus, we model an offloading problem among edges for providing real AIGC services and propose LAD-TS, a novel Latent Action Diffusion-based Task Scheduling method that orchestrates multiple edge servers for expedited AIGC services. The LAD-TS generates a near-optimal offloading decision by leveraging the diffusion model's conditional generation capability and the reinforcement learning's environment interaction ability, thereby minimizing the service delays under multiple resource constraints. Meanwhile, a latent action diffusion strategy is designed to guide decision generation by utilizing historical action probability, enabling rapid achievement of near-optimal decisions. Furthermore, we develop DEdgeAI, a prototype edge system with a refined AIGC model deployment to implement and evaluate our LAD-TS method. DEdgeAI provides a real AIGC service for users, demonstrating up to 29.18% shorter service delays than the current five representative AIGC platforms. We release our open-source code at https://github.com/ChangfuXu/DEdgeAI/.

Paper Structure

This paper contains 25 sections, 3 theorems, 17 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

The offline counterpart of the problem (Eq-problem) is an NP-hard problem.

Figures (11)

  • Figure 1: An illustration of existing AIGC applications. Users send their prompts to the cloud server that has deployed the corresponding AIGC models by backhaul network. Then, the corresponding contents (e.g., image) are generated according to the prompts and are sent back to users by the server.
  • Figure 2: An illustration of the efficient distributed AIGC in this paper. For each AIGC task in each time slot, the scheduler will decide which ES to process the task by the proposed LAD-TS method.
  • Figure 3: The overall architecture of our method. For each ES $b$, new arrival AIGC tasks are online offloaded to ESs for parallel processing by the actor. The actor utilizes the system state $\boldsymbol{s}_{b,n,t}$ and historical action probability $\boldsymbol{x}_{b,n,t}$ with edge-edge collaboration. The $\boldsymbol{x}_{b,n,t}$is stored in memory. The t-LADN model is periodically offline trained using history samples.
  • Figure 4: The Actor structure with proposed LADN model. The actor input are the timestep $I$, latent action probability $\boldsymbol{x}_{b,n,t,I}$, and system state $\boldsymbol{s}_{b,n,t}$. The output is the action decision $\boldsymbol{a}_{b,n,t}$. The historical action probability $\boldsymbol{x}_{b,n,t,0}$ is stored (or updated) into the array $X_{b}[n]$.
  • Figure 5: The average service delays of our LAD-TS method and baselines with increasing training episodes.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof