Table of Contents
Fetching ...

Joint Model Assignment and Resource Allocation for Cost-Effective Mobile Generative Services

Shuangwei Gao, Peng Yang, Yuxin Kong, Feng Lyu, Ning Zhang

TL;DR

This work tackles the challenge of delivering cost-effective mobile AIGC services by offloading generation tasks to edge servers. It introduces an edge-enabled provisioning framework with probabilistic model assignment that leverages category-based prompt score distributions and a simulated annealing–based mechanism for adaptive step selection and resource allocation, aiming to maximize the utility (quality minus latency) under a fixed edge compute budget. The key contributions are (i) a probabilistic assignment strategy that maps prompt categories to models to balance CLIPScore and latency, (ii) a simulated annealing algorithm that jointly tunes denoising steps and resource allocations, and (iii) comprehensive simulations showing up to 4.7% quality improvements and 39.1% latency reductions over baselines. These results demonstrate the practical potential of edge-hosted AIGC services to deliver high-quality content with low delay in resource-constrained scenarios, enabling scalable, personalized mobile generative services.

Abstract

Artificial Intelligence Generated Content (AIGC) services can efficiently satisfy user-specified content creation demands, but the high computational requirements pose various challenges to supporting mobile users at scale. In this paper, we present our design of an edge-enabled AIGC service provisioning system to properly assign computing tasks of generative models to edge servers, thereby improving overall user experience and reducing content generation latency. Specifically, once the edge server receives user requested task prompts, it dynamically assigns appropriate models and allocates computing resources based on features of each category of prompts. The generated contents are then delivered to users. The key to this system is a proposed probabilistic model assignment approach, which estimates the quality score of generated contents for each prompt based on category labels. Next, we introduce a heuristic algorithm that enables adaptive configuration of both generation steps and resource allocation, according to the various task requests received by each generative model on the edge.Simulation results demonstrate that the designed system can effectively enhance the quality of generated content by up to 4.7% while reducing response delay by up to 39.1% compared to benchmarks.

Joint Model Assignment and Resource Allocation for Cost-Effective Mobile Generative Services

TL;DR

This work tackles the challenge of delivering cost-effective mobile AIGC services by offloading generation tasks to edge servers. It introduces an edge-enabled provisioning framework with probabilistic model assignment that leverages category-based prompt score distributions and a simulated annealing–based mechanism for adaptive step selection and resource allocation, aiming to maximize the utility (quality minus latency) under a fixed edge compute budget. The key contributions are (i) a probabilistic assignment strategy that maps prompt categories to models to balance CLIPScore and latency, (ii) a simulated annealing algorithm that jointly tunes denoising steps and resource allocations, and (iii) comprehensive simulations showing up to 4.7% quality improvements and 39.1% latency reductions over baselines. These results demonstrate the practical potential of edge-hosted AIGC services to deliver high-quality content with low delay in resource-constrained scenarios, enabling scalable, personalized mobile generative services.

Abstract

Artificial Intelligence Generated Content (AIGC) services can efficiently satisfy user-specified content creation demands, but the high computational requirements pose various challenges to supporting mobile users at scale. In this paper, we present our design of an edge-enabled AIGC service provisioning system to properly assign computing tasks of generative models to edge servers, thereby improving overall user experience and reducing content generation latency. Specifically, once the edge server receives user requested task prompts, it dynamically assigns appropriate models and allocates computing resources based on features of each category of prompts. The generated contents are then delivered to users. The key to this system is a proposed probabilistic model assignment approach, which estimates the quality score of generated contents for each prompt based on category labels. Next, we introduce a heuristic algorithm that enables adaptive configuration of both generation steps and resource allocation, according to the various task requests received by each generative model on the edge.Simulation results demonstrate that the designed system can effectively enhance the quality of generated content by up to 4.7% while reducing response delay by up to 39.1% compared to benchmarks.
Paper Structure (17 sections, 7 equations, 10 figures, 1 algorithm)

This paper contains 17 sections, 7 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Fitting Gaussian distribution curve of four categories of prompts.
  • Figure 2: Models performance across different denoising steps
  • Figure 3: An overview of proposed system. The examples of generated image are the result of three different models.
  • Figure 4: CLIPScore of three different score level prompts with different models.
  • Figure 5: The performance comparison of different model assignment methods.
  • ...and 5 more figures