Table of Contents
Fetching ...

Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis

Lars Krupp, Daniel Geißler, Vishal Banwari, Paul Lukowicz, Jakob Karolus

TL;DR

The paper investigates the energy and CO extsubscript{2} emissions of web agents driven by LLMs, addressing sustainability amid rapidly growing deployment. It combines an empirical Mind2Web benchmark of five open-source agents across multiple GPUs with a theoretical framework to estimate energy use for proprietary models, revealing that design philosophy and preprocessing greatly influence energy efficiency and that higher energy does not guarantee better results. Key contributions include a standardized energy benchmark for open-source agents, a transparent methodology for estimating proprietary-agent energy, and CO extsubscript{2} emission analyses under different energy mixes. The findings advocate for integrating energy-specific metrics into web-agent benchmarks to guide sustainable development and inform users and policymakers about environmental trade-offs. This work advances practical, comparable measures of energy efficiency in web agents, with significant implications for scalable and responsible AI deployment.

Abstract

Web agents, like OpenAI's Operator and Google's Project Mariner, are powerful agentic systems pushing the boundaries of Large Language Models (LLM). They can autonomously interact with the internet at the user's behest, such as navigating websites, filling search masks, and comparing price lists. Though web agent research is thriving, induced sustainability issues remain largely unexplored. To highlight the urgency of this issue, we provide an initial exploration of the energy and $CO_2$ cost associated with web agents from both a theoretical -via estimation- and an empirical perspective -by benchmarking. Our results show how different philosophies in web agent creation can severely impact the associated expended energy, and that more energy consumed does not necessarily equate to better results. We highlight a lack of transparency regarding disclosing model parameters and processes used for some web agents as a limiting factor when estimating energy consumption. Our work contributes towards a change in thinking of how we evaluate web agents, advocating for dedicated metrics measuring energy consumption in benchmarks.

Promoting Sustainable Web Agents: Benchmarking and Estimating Energy Consumption through Empirical and Theoretical Analysis

TL;DR

The paper investigates the energy and CO extsubscript{2} emissions of web agents driven by LLMs, addressing sustainability amid rapidly growing deployment. It combines an empirical Mind2Web benchmark of five open-source agents across multiple GPUs with a theoretical framework to estimate energy use for proprietary models, revealing that design philosophy and preprocessing greatly influence energy efficiency and that higher energy does not guarantee better results. Key contributions include a standardized energy benchmark for open-source agents, a transparent methodology for estimating proprietary-agent energy, and CO extsubscript{2} emission analyses under different energy mixes. The findings advocate for integrating energy-specific metrics into web-agent benchmarks to guide sustainable development and inform users and policymakers about environmental trade-offs. This work advances practical, comparable measures of energy efficiency in web agents, with significant implications for scalable and responsible AI deployment.

Abstract

Web agents, like OpenAI's Operator and Google's Project Mariner, are powerful agentic systems pushing the boundaries of Large Language Models (LLM). They can autonomously interact with the internet at the user's behest, such as navigating websites, filling search masks, and comparing price lists. Though web agent research is thriving, induced sustainability issues remain largely unexplored. To highlight the urgency of this issue, we provide an initial exploration of the energy and cost associated with web agents from both a theoretical -via estimation- and an empirical perspective -by benchmarking. Our results show how different philosophies in web agent creation can severely impact the associated expended energy, and that more energy consumed does not necessarily equate to better results. We highlight a lack of transparency regarding disclosing model parameters and processes used for some web agents as a limiting factor when estimating energy consumption. Our work contributes towards a change in thinking of how we evaluate web agents, advocating for dedicated metrics measuring energy consumption in benchmarks.

Paper Structure

This paper contains 19 sections, 4 equations, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Energy consumption per web agent and GPU.
  • Figure 2: Pipeline depicting how an action is chosen in MindAct.
  • Figure 3: Pipeline depicting how an action is chosen in LASER.