Table of Contents
Fetching ...

Addressing the sustainable AI trilemma: a case study on LLM agents and RAG

Hui Wu, Xiaoyang Wang, Zhong Fan

TL;DR

The paper defines the Sustainable AI Trilemma to balance AI capability, digital equity, and environmental sustainability, and analyzes its manifestation in LLM-based agents using RAG. It introduces a unified energy-cost framework focused on memory modules, and proposes metrics (e.g., RERR, EAR, GEOR, GERR, FER, CER) to quantify trade-offs between energy and performance. Through a detailed case study across memory formation, reading, utilization, and management, the authors show substantial energy inefficiencies in memory-centric designs and reveal that resource-constrained environments incur disproportionately higher energy penalties for comparable performance. The work advocates moving beyond a purely LLM-centric autonomy paradigm and provides actionable metrics and insights to guide the development of more energy-efficient, equitable, and scalable AI systems.

Abstract

Large language models (LLMs) have demonstrated significant capabilities, but their widespread deployment and more advanced applications raise critical sustainability challenges, particularly in inference energy consumption. We propose the concept of the Sustainable AI Trilemma, highlighting the tensions between AI capability, digital equity, and environmental sustainability. Through a systematic case study of LLM agents and retrieval-augmented generation (RAG), we analyze the energy costs embedded in memory module designs and introduce novel metrics to quantify the trade-offs between energy consumption and system performance. Our experimental results reveal significant energy inefficiencies in current memory-augmented frameworks and demonstrate that resource-constrained environments face disproportionate efficiency penalties. Our findings challenge the prevailing LLM-centric paradigm in agent design and provide practical insights for developing more sustainable AI systems.

Addressing the sustainable AI trilemma: a case study on LLM agents and RAG

TL;DR

The paper defines the Sustainable AI Trilemma to balance AI capability, digital equity, and environmental sustainability, and analyzes its manifestation in LLM-based agents using RAG. It introduces a unified energy-cost framework focused on memory modules, and proposes metrics (e.g., RERR, EAR, GEOR, GERR, FER, CER) to quantify trade-offs between energy and performance. Through a detailed case study across memory formation, reading, utilization, and management, the authors show substantial energy inefficiencies in memory-centric designs and reveal that resource-constrained environments incur disproportionately higher energy penalties for comparable performance. The work advocates moving beyond a purely LLM-centric autonomy paradigm and provides actionable metrics and insights to guide the development of more energy-efficient, equitable, and scalable AI systems.

Abstract

Large language models (LLMs) have demonstrated significant capabilities, but their widespread deployment and more advanced applications raise critical sustainability challenges, particularly in inference energy consumption. We propose the concept of the Sustainable AI Trilemma, highlighting the tensions between AI capability, digital equity, and environmental sustainability. Through a systematic case study of LLM agents and retrieval-augmented generation (RAG), we analyze the energy costs embedded in memory module designs and introduce novel metrics to quantify the trade-offs between energy consumption and system performance. Our experimental results reveal significant energy inefficiencies in current memory-augmented frameworks and demonstrate that resource-constrained environments face disproportionate efficiency penalties. Our findings challenge the prevailing LLM-centric paradigm in agent design and provide practical insights for developing more sustainable AI systems.
Paper Structure (40 sections, 10 equations, 5 figures, 6 tables)

This paper contains 40 sections, 10 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The Sustainable AI Trilemma.
  • Figure 2: Design of Memory Modules in LLM Agents or RAG. The figure contains the operations that require LLM inference: (a) Memory Formation, (b) Retrieval Necessity Detection, (c) Query Optimization, (d) Reranking, (e) Compression, (f) Generation. The red markers represent data that is necessary for a QA process to generate a correct response, while black indicates non-necessary data.
  • Figure 3: New efficiency metrics to balance energy cost and task performance.
  • Figure 4: The Relationship between Energy Cost Multiple and Memory Reading Workload (top-K value)
  • Figure 5: Comparison of Memory Formation in Resource-constrained vs. Resource-abundant Environments