Table of Contents
Fetching ...

Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption

Lars Krupp, Daniel Geißler, Paul Lukowicz, Jakob Karolus

TL;DR

This paper addresses the sustainability of web agents that autonomously browse the web and evaluates energy cost and CO2 emissions by comparing MindAct and LASER on Mind2Web. It demonstrates that design philosophy strongly influences energy consumption, with MindAct using orders of magnitude less energy due to preprocessing with small models, while LASER relies on a heavy GPT-4 core. The authors advocate for dedicated energy metrics, token-level reporting, and modular, small-model pipelines to improve sustainability and comparability. They also highlight the need for transparency around model parameters to enable accurate environmental accounting at scale.

Abstract

Improvements in the area of large language models have shifted towards the construction of models capable of using external tools and interpreting their outputs. These so-called web agents have the ability to interact autonomously with the internet. This allows them to become powerful daily assistants handling time-consuming, repetitive tasks while supporting users in their daily activities. While web agent research is thriving, the sustainability aspect of this research direction remains largely unexplored. We provide an initial exploration of the energy and CO2 cost associated with web agents. Our results show how different philosophies in web agent creation can severely impact the associated expended energy. We highlight lacking transparency regarding the disclosure of model parameters and processes used for some web agents as a limiting factor when estimating energy consumption. As such, our work advocates a change in thinking when evaluating web agents, warranting dedicated metrics for energy consumption and sustainability.

Towards Sustainable Web Agents: A Plea for Transparency and Dedicated Metrics for Energy Consumption

TL;DR

This paper addresses the sustainability of web agents that autonomously browse the web and evaluates energy cost and CO2 emissions by comparing MindAct and LASER on Mind2Web. It demonstrates that design philosophy strongly influences energy consumption, with MindAct using orders of magnitude less energy due to preprocessing with small models, while LASER relies on a heavy GPT-4 core. The authors advocate for dedicated energy metrics, token-level reporting, and modular, small-model pipelines to improve sustainability and comparability. They also highlight the need for transparency around model parameters to enable accurate environmental accounting at scale.

Abstract

Improvements in the area of large language models have shifted towards the construction of models capable of using external tools and interpreting their outputs. These so-called web agents have the ability to interact autonomously with the internet. This allows them to become powerful daily assistants handling time-consuming, repetitive tasks while supporting users in their daily activities. While web agent research is thriving, the sustainability aspect of this research direction remains largely unexplored. We provide an initial exploration of the energy and CO2 cost associated with web agents. Our results show how different philosophies in web agent creation can severely impact the associated expended energy. We highlight lacking transparency regarding the disclosure of model parameters and processes used for some web agents as a limiting factor when estimating energy consumption. As such, our work advocates a change in thinking when evaluating web agents, warranting dedicated metrics for energy consumption and sustainability.

Paper Structure

This paper contains 11 sections, 3 equations, 3 figures.

Figures (3)

  • Figure 1: A pipeline depicting the generic structure of web agents.
  • Figure 2: Pipeline depicting how an action is chosen in MindAct following the two stage process shown in \ref{['fig:overview_pipe']}. For each element $e_i$ in the DOM, the user query and textual descriptions of parent and child elements are added to form the input of DeBERTa which calculates a matching score $MS_i$. The 50 highest scored elements are selected for the next stage and transformed into 10 multiple-choice question answering tasks with the user query as the question and given to flan-T5$_{XL}$ which selects an action to take.
  • Figure 3: Pipeline depicting how an action is chosen in LASER. An input consists of the user query, the HTML and a subset of actions the model can take depending on the state space. The LLM also has the option to access the memory buffer to aid recovery from failed paths. Memory buffer and state space get updated with the generated action.