AOLO: Analysis and Optimization For Low-Carbon Oriented Wireless Large Language Model Services
Xiaoqi Wang, Hongyang Du, Yuehong Gao, Dong In Kim
TL;DR
AOLO tackles the environmental impact of LLM inference by integrating emissions from both computation and wireless transmission into a unified carbon footprint model. It introduces a joint optimization framework and a Spiking Neural Network-based DRL (SDRL) with a PopSAN actor to minimize ${\cal C}_{\text{I}} + \bar{\cal C}_{\text{C}}$ under QoE and timing constraints, adjusting inference output length ${\kappa}$ and transmit power ${P_{\text{trans}}}$. Key contributions include the first end-to-end carbon model for wireless LLM services, a formal optimization problem, and the SDRL algorithm that demonstrates substantial carbon reductions (e.g., an 18.77% reduction over a Soft Actor-Critic baseline in simulations). The work enables more sustainable LLM inference services in wireless networks and opens avenues for low-carbon scheduling and resource allocation across providers and users.
Abstract
Recent advancements in large language models (LLMs) have led to their widespread adoption and large-scale deployment across various domains. However, their environmental impact, particularly during inference, has become a growing concern due to their substantial energy consumption and carbon footprint. Existing research has focused on inference computation alone, overlooking the analysis and optimization of carbon footprint in network-aided LLM service systems. To address this gap, we propose AOLO, a framework for analysis and optimization for low-carbon oriented wireless LLM services. AOLO introduces a comprehensive carbon footprint model that quantifies greenhouse gas emissions across the entire LLM service chain, including computational inference and wireless communication. Furthermore, we formulate an optimization problem aimed at minimizing the overall carbon footprint, which is solved through joint optimization of inference outputs and transmit power under quality-of-experience and system performance constraints. To achieve this joint optimization, we leverage the energy efficiency of spiking neural networks (SNNs) by adopting SNN as the actor network and propose a low-carbon-oriented optimization algorithm, i.e., SNN-based deep reinforcement learning (SDRL). Comprehensive simulations demonstrate that SDRL algorithm significantly reduces overall carbon footprint, achieving an 18.77% reduction compared to the benchmark soft actor-critic, highlighting its potential for enabling more sustainable LLM inference services.
