Table of Contents
Fetching ...

AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services

Xiaoqi Wang, Hongyang Du, Yuehong Gao, Dong In Kim

TL;DR

AOLO tackles the environmental impact of LLM inference by integrating emissions from both computation and wireless transmission into a unified carbon footprint model. It introduces a joint optimization framework and a Spiking Neural Network-based DRL (SDRL) with a PopSAN actor to minimize ${\cal C}_{\text{I}} + \bar{\cal C}_{\text{C}}$ under QoE and timing constraints, adjusting inference output length ${\kappa}$ and transmit power ${P_{\text{trans}}}$. Key contributions include the first end-to-end carbon model for wireless LLM services, a formal optimization problem, and the SDRL algorithm that demonstrates substantial carbon reductions (e.g., an 18.77% reduction over a Soft Actor-Critic baseline in simulations). The work enables more sustainable LLM inference services in wireless networks and opens avenues for low-carbon scheduling and resource allocation across providers and users.

Abstract

Recent advancements in large language models (LLMs) have led to their widespread adoption and large-scale deployment across various domains. However, their environmental impact, particularly during inference, has become a growing concern due to their substantial energy consumption and carbon footprint. Existing research has focused on inference computation alone, overlooking the analysis and optimization of carbon footprint in network-aided LLM service systems. To address this gap, we propose AOLO, a framework for analysis and optimization for low-carbon oriented wireless LLM services. AOLO introduces a comprehensive carbon footprint model that quantifies greenhouse gas emissions across the entire LLM service chain, including computational inference and wireless communication. Furthermore, we formulate an optimization problem aimed at minimizing the overall carbon footprint, which is solved through joint optimization of inference outputs and transmit power under quality-of-experience and system performance constraints. To achieve this joint optimization, we leverage the energy efficiency of spiking neural networks (SNNs) by adopting SNN as the actor network and propose a low-carbon-oriented optimization algorithm, i.e., SNN-based deep reinforcement learning (SDRL). Comprehensive simulations demonstrate that SDRL algorithm significantly reduces overall carbon footprint, achieving an 18.77% reduction compared to the benchmark soft actor-critic, highlighting its potential for enabling more sustainable LLM inference services.

AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services

TL;DR

AOLO tackles the environmental impact of LLM inference by integrating emissions from both computation and wireless transmission into a unified carbon footprint model. It introduces a joint optimization framework and a Spiking Neural Network-based DRL (SDRL) with a PopSAN actor to minimize under QoE and timing constraints, adjusting inference output length and transmit power . Key contributions include the first end-to-end carbon model for wireless LLM services, a formal optimization problem, and the SDRL algorithm that demonstrates substantial carbon reductions (e.g., an 18.77% reduction over a Soft Actor-Critic baseline in simulations). The work enables more sustainable LLM inference services in wireless networks and opens avenues for low-carbon scheduling and resource allocation across providers and users.

Abstract

Recent advancements in large language models (LLMs) have led to their widespread adoption and large-scale deployment across various domains. However, their environmental impact, particularly during inference, has become a growing concern due to their substantial energy consumption and carbon footprint. Existing research has focused on inference computation alone, overlooking the analysis and optimization of carbon footprint in network-aided LLM service systems. To address this gap, we propose AOLO, a framework for analysis and optimization for low-carbon oriented wireless LLM services. AOLO introduces a comprehensive carbon footprint model that quantifies greenhouse gas emissions across the entire LLM service chain, including computational inference and wireless communication. Furthermore, we formulate an optimization problem aimed at minimizing the overall carbon footprint, which is solved through joint optimization of inference outputs and transmit power under quality-of-experience and system performance constraints. To achieve this joint optimization, we leverage the energy efficiency of spiking neural networks (SNNs) by adopting SNN as the actor network and propose a low-carbon-oriented optimization algorithm, i.e., SNN-based deep reinforcement learning (SDRL). Comprehensive simulations demonstrate that SDRL algorithm significantly reduces overall carbon footprint, achieving an 18.77% reduction compared to the benchmark soft actor-critic, highlighting its potential for enabling more sustainable LLM inference services.

Paper Structure

This paper contains 34 sections, 1 theorem, 33 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

The closed-form average transmission time and average carbon footprint of the wireless communication phase in our considered system are derived as where and $G_{{p_1},{q_1}:{p_2},{q_2}:{p_3},{q_3}}^{{m_1},{n_1}:{m_2},{n_2}:{m_3},n_3}\left( \cdot \right)$ is the extended generalized bivariate Meijer G-function (EGBMGF) webWolfram.

Figures (7)

  • Figure 1: Illustration of wireless network-aided LLM inference service system with a data center and a base station. The overall carbon footprint is minimized through joint optimization of inference outputs and transmit power.
  • Figure 2: Imapct of inference output word count on the inference execution time and subjective QoE. In subfigure (b), 100 stories with varying word counts (0-1000 words) on a shared theme are generated using Chat GPT-4o with canvas. Then, Chat GPT-4 evaluates these stories from a five-year-old’s perspective, scoring them from 0 to 10 based on interestingness, clarity, and length suitability.
  • Figure 3: Illustration of SNN-based actor network employing PopSAN method, tailored to generate optimal decisions for minimizing the overall carbon footprint associated with serving an inference request. The encoder transforms each state dimension into spiking activity employing population coding, generating input spikes for the SNN. The SNN module processes these spikes through its spatio-temporal structure to produce output spikes. Finally, the decoder calculates the firing rates of output populations and converts them into continuous action values, completing the decision-making process.
  • Figure 4: Overall architecture of the SDRL algorithm with an SNN-based actor network utilizing PopSAN method and a double-critic network.
  • Figure 5: Comparison of SDRL, SAC, PPO and random policy. The reward curves are smoothed for clarity.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Theorem 1