Table of Contents
Fetching ...

6G EdgeAI: Performance Evaluation and Analysis

Chien-Sheng Yang, Yu-Jen Ku, Yuan-Yao Lou, Nathan Tenny, Alex C. -C. Hsu

TL;DR

This paper tackles the latency challenges of GenAI workloads in 6G by proposing Integrated Communication and Computing (ICC), a framework that colocates computing near the RAN and jointly optimizes communication and computation. Through a queueing-theoretic model, ICC shows a $98\%$ higher service capacity than 5G MEC, and system-level simulations with transformer-based LLM workloads demonstrate a $60\%$ reduction in end-to-end latency and $27\%$ lower compute costs, especially when employing a priority-based joint latency management strategy. The analysis leverages a tandem $M/M/1$ model for the communication and computing stages, with FCFS discipline and independence of stage sojourn times, and validates findings using realistic GPU configurations and LLM inference models. The results indicate that ICC is a practical and scalable path to delivering real-time GenAI services at 6G network edges, with potential applicability to other latency-sensitive applications and further gains through system-wide offline and online offloading optimizations.

Abstract

Generative AI (GenAI) services powered by large language models (LLMs) increasingly deliver real-time interactions, yet existing 5G multi-access edge computing (MEC) architectures often treat communication and computing as separate domains, limiting their ability to meet stringent latency requirements. To address this challenge, we introduce an Integrated Communication and Computing (ICC) framework where computing capabilities are enabled to reside directly in radio access network (RAN) nodes and jointly manage bandwidth and computing resources. Our queueing-theoretic analysis shows that ICC outperforms 5G MEC, achieving higher service capacity (defined as the maximum arrival rate that maintains a specified fraction of jobs completed within a given delay budget) by 98%. We corroborate these gains through system-level simulations that account for transformer-based LLM workloads, realistic GPU specifications, and a priority-based scheduling scheme. The simulations show that ICC improves service capacity by 60%, demonstrating its potential to enable efficient, cost-effective real-time GenAI services in 6G.

6G EdgeAI: Performance Evaluation and Analysis

TL;DR

This paper tackles the latency challenges of GenAI workloads in 6G by proposing Integrated Communication and Computing (ICC), a framework that colocates computing near the RAN and jointly optimizes communication and computation. Through a queueing-theoretic model, ICC shows a higher service capacity than 5G MEC, and system-level simulations with transformer-based LLM workloads demonstrate a reduction in end-to-end latency and lower compute costs, especially when employing a priority-based joint latency management strategy. The analysis leverages a tandem model for the communication and computing stages, with FCFS discipline and independence of stage sojourn times, and validates findings using realistic GPU configurations and LLM inference models. The results indicate that ICC is a practical and scalable path to delivering real-time GenAI services at 6G network edges, with potential applicability to other latency-sensitive applications and further gains through system-wide offline and online offloading optimizations.

Abstract

Generative AI (GenAI) services powered by large language models (LLMs) increasingly deliver real-time interactions, yet existing 5G multi-access edge computing (MEC) architectures often treat communication and computing as separate domains, limiting their ability to meet stringent latency requirements. To address this challenge, we introduce an Integrated Communication and Computing (ICC) framework where computing capabilities are enabled to reside directly in radio access network (RAN) nodes and jointly manage bandwidth and computing resources. Our queueing-theoretic analysis shows that ICC outperforms 5G MEC, achieving higher service capacity (defined as the maximum arrival rate that maintains a specified fraction of jobs completed within a given delay budget) by 98%. We corroborate these gains through system-level simulations that account for transformer-based LLM workloads, realistic GPU specifications, and a priority-based scheduling scheme. The simulations show that ICC improves service capacity by 60%, demonstrating its potential to enable efficient, cost-effective real-time GenAI services in 6G.

Paper Structure

This paper contains 12 sections, 1 theorem, 5 equations, 7 figures, 1 table.

Key Result

Lemma 1

In steady state, under First-Come First-Served (FCFS) discipline, the computing queue is a $M/M/1$ queue. Moreover, the sojourn times experienced by any tagged job in the communication and computing queues are independent.

Figures (7)

  • Figure 1: High-level overview of the proposed ICC design.
  • Figure 2: 6G ICC system model for theoretical analysis.
  • Figure 3: Queueing modeling for 6G ICC system illustrated in Fig. \ref{['fig:sys']}.
  • Figure 4: Comparison of ICC and 5G MEC performance of job satisfication rate over different job arrival rates.
  • Figure 5: Overview of the simulation framework for performance evaluation of ICC compared to 5G MEC.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1: Job Satisfaction
  • Definition 2: Service Capacity
  • Lemma 1