Unveiling Environmental Impacts of Large Language Model Serving: A Functional Unit View
Yanran Wu, Inez Hua, Yi Ding
TL;DR
The paper introduces FUEL, a Functional Unit-based Evaluation framework for quantifying the environmental impact of LLM serving. By defining an FU as a token generated under explicit workload, performance, and quality constraints, FUEL enables fair, cross-model comparisons across model size, quantization, and hardware. Through three case studies using Qwen2.5 and Llama2 on vLLM with NewsQA as the benchmark, the work reveals nuanced tradeoffs: larger models can be greener at high-quality, low-QPS regimes; activation quantization (W8A8) often yields robust carbon reductions; and older hardware can outperform newer systems in carbon efficiency under certain conditions. These insights provide practical guidance for deploying greener LLM services and underscore the importance of joint optimization across model configuration and infrastructure. The authors also release code at GitHub for reproducible, FU-based environmental assessments.
Abstract
Large language models (LLMs) offer powerful capabilities but come with significant environmental impact, particularly in carbon emissions. Existing studies benchmark carbon emissions but lack a standardized basis for comparison across different model configurations. To address this, we introduce the concept of functional unit (FU) as a standardized basis and develop FUEL, the first FU-based framework for evaluating LLM serving's environmental impact. Through three case studies, we uncover key insights and trade-offs in reducing carbon emissions by optimizing model size, quantization strategy, and hardware choice, paving the way for more sustainable LLM serving. The code is available at https://github.com/jojacola/FUEL.
