Table of Contents
Fetching ...

Estimating the Empowerment of Language Model Agents

Jinyeop Song, Jeff Gore, Max Kleiman-Weiner

TL;DR

This work introduces EELMA, a scalable, goal-agnostic method to evaluate language-model (LM) agents via empowerment, defined as the mutual information between an agent's actions and future states $\mathcal{E}(\pi_{LM}) = \mathbb{E}_{s_t,a_t,s_*}[ I(a_t; s_* \mid s_t) ]$. EELMA estimates effective empowerment from multi-turn text-based trajectories by embedding observations and actions and applying a contrastive InfoNCE objective through two encoders to approximate the mutual information. Empirical validation shows empowerment estimates align with actual task performance across language games (Gridworld, Tower of Hanoi) and a web-browsing sandbox (WebArena), while also revealing pivotal moments and robustness to linguistic variability. The results support empowerment as a practical, information-theoretic proxy for agentic capability in open-ended settings, with implications for monitoring safety and guiding model development, though limitations exist for tasks with non-dynamic objectives or indirect power dynamics and future work could extend to multimodal and multi-agent scenarios.

Abstract

As language model (LM) agents become more capable and gain broader access to real-world tools, there is a growing need for scalable evaluation frameworks of agentic capability. However, conventional benchmark-centric evaluations are costly to design and require human designers to come up with valid tasks that translate into insights about general model capabilities. In this work, we propose information-theoretic evaluation based on empowerment, the mutual information between an agent's actions and future states, as an open-ended method for evaluating LM agents. We introduce EELMA (Estimating Empowerment of Language Model Agents), an algorithm for approximating effective empowerment from multi-turn text interactions. We validate EELMA on both language games and scaled-up realistic web-browsing scenarios. We find that empowerment strongly correlates with average task performance, characterize the impact of environmental complexity and agentic factors such as chain-of-thought, model scale, and memory length on estimated empowerment, and that high empowerment states and actions are often pivotal moments for general capabilities. Together, these results demonstrate empowerment as an appealing general-purpose metric for evaluating and monitoring LM agents in complex, open-ended settings.

Estimating the Empowerment of Language Model Agents

TL;DR

This work introduces EELMA, a scalable, goal-agnostic method to evaluate language-model (LM) agents via empowerment, defined as the mutual information between an agent's actions and future states . EELMA estimates effective empowerment from multi-turn text-based trajectories by embedding observations and actions and applying a contrastive InfoNCE objective through two encoders to approximate the mutual information. Empirical validation shows empowerment estimates align with actual task performance across language games (Gridworld, Tower of Hanoi) and a web-browsing sandbox (WebArena), while also revealing pivotal moments and robustness to linguistic variability. The results support empowerment as a practical, information-theoretic proxy for agentic capability in open-ended settings, with implications for monitoring safety and guiding model development, though limitations exist for tasks with non-dynamic objectives or indirect power dynamics and future work could extend to multimodal and multi-agent scenarios.

Abstract

As language model (LM) agents become more capable and gain broader access to real-world tools, there is a growing need for scalable evaluation frameworks of agentic capability. However, conventional benchmark-centric evaluations are costly to design and require human designers to come up with valid tasks that translate into insights about general model capabilities. In this work, we propose information-theoretic evaluation based on empowerment, the mutual information between an agent's actions and future states, as an open-ended method for evaluating LM agents. We introduce EELMA (Estimating Empowerment of Language Model Agents), an algorithm for approximating effective empowerment from multi-turn text interactions. We validate EELMA on both language games and scaled-up realistic web-browsing scenarios. We find that empowerment strongly correlates with average task performance, characterize the impact of environmental complexity and agentic factors such as chain-of-thought, model scale, and memory length on estimated empowerment, and that high empowerment states and actions are often pivotal moments for general capabilities. Together, these results demonstrate empowerment as an appealing general-purpose metric for evaluating and monitoring LM agents in complex, open-ended settings.

Paper Structure

This paper contains 40 sections, 24 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: Empowerment reflects an agent’s ability to reach diverse future states. (Top) A low-empowerment LM-agent becomes trapped in a loop and thus can access only a small fraction of states. (Bottom) A high-empowerment LM-agent effectively explores a wider range of trajectories and can successfully reach states that solve different random goals.
  • Figure 2: EELMA Overview. EELMA quantifies the empowerment of LM-agent from text-based trajectories by mapping textual observations and actions to compact embeddings and estimating variational mutual information using InfoNCE Le_Khac_2020.
  • Figure 3: EELMA accurately estimates the effective empowerment. We validated the EELMA algorithm in three Gridworld scenarios and the Tower Of Hanoi(ToH). (A) State-conditional empowerment estimated by EELMA closely aligns with direct estimation. Heatmaps represent empowerment averaged across agent positions in the Gridworld. The graphs display empowerment for configuration (merged by permutation symmetry) in the ToH. (B) The correlation plot shows strong alignment between effective empowerment estimates from EELMA and direct estimation.
  • Figure 4: EELMA identifies influential actions. State–action conditional empowerment for valid (leading to novel states according to the game rules) and invalid actions in GridWorld (left) and ToH (right). Valid actions, which produce meaningful state transitions (e.g., moving to an empty grid in GridWorld, or placing a smaller disk onto a larger one in ToH), exhibit significantly higher empowerment than invalid actions (e.g., moving into a box, or placing a larger disk onto a smaller disk in ToH). The difference between valid and invalid actions is statistically significant (*** $p<0.001$, t-test).
  • Figure 5: Environmental complexity affects effective empowerment. We vary the number of boxes from 4 to 7 in a 4-by-4 Gridworld (left), and the number of disks from 3 to 5 in the ToH of 3 rods (right). The effective empowerment of the LM-agent progressively decreases in environments compared to max empowerment (e.g., theoretical bound that optimal policy can exert influence) in higher complexity, correlating closely with reduced average rewards.
  • ...and 8 more figures