Table of Contents
Fetching ...

Controlling Large Language Model Agents with Entropic Activation Steering

Nate Rahn, Pierluca D'Oro, Marc G. Bellemare

TL;DR

Entropic Activation Steering (EAST) is introduced, an activation steering method for in-context LLM agents that can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling.

Abstract

The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspective, and introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. Firstly, we demonstrate that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling. Secondly, we reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions. Finally, we demonstrate that the steering vectors obtained by EAST generalize across task variants. In total, these results show that LLM agents explicitly encode uncertainty over their actions in their representation space. Our work paves the way for a new understanding of the functioning of LLM agents and to effective control of their decision-making behaviors.

Controlling Large Language Model Agents with Entropic Activation Steering

TL;DR

Entropic Activation Steering (EAST) is introduced, an activation steering method for in-context LLM agents that can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling.

Abstract

The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspective, and introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. Firstly, we demonstrate that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling. Secondly, we reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions. Finally, we demonstrate that the steering vectors obtained by EAST generalize across task variants. In total, these results show that LLM agents explicitly encode uncertainty over their actions in their representation space. Our work paves the way for a new understanding of the functioning of LLM agents and to effective control of their decision-making behaviors.
Paper Structure (19 sections, 1 equation, 14 figures, 1 table)

This paper contains 19 sections, 1 equation, 14 figures, 1 table.

Figures (14)

  • Figure 1: Overview of Entropic Activation Steering (EAST). In Phase 1, the method constructs a steering vector by averaging the activations produced by the LLM agent given a set of prompts, weighting them by the entropy of the resulting action distribution. In Phase 2, during new runs of interactions with the environment, it steers the agent by adding this vector to the LLM's activations at a target layer for each generated token position. The method increases the agent's subjective uncertainty about what to do and leads to more exploratory behavior.
  • Figure 2: Left: Evolution of choices over two actions (0 and 1) taken by LLM agent over time in increasingly ambiguous bandit settings. A darker color corresponds to a more common behavior. The LLM agent tends to commit to a single arm even when choosing should be hard or impossible. Right: The evolution of the LLM agent's entropy over actions, over time. The rapid decrease in entropy corresponds to the agent committing to a single action.
  • Figure 3: Example of the interaction between token-level sampling and action-level sampling for a two-armed bandit, showing the evolution of the probability that the first action is ultimately selected as the tokens are generated by the LLM.
  • Figure 4: Distribution of choices over two actions (0 and 1) taken by the LLM agent over time when varying the sampling temperature. A darker color corresponds to a more common behavior, and incomplete lines are due to the episode terminating early because of invalid actions. Increasing temperature until the point at which no action can be parsed from the LLM's generations does not significantly change the entropy in action distribution.
  • Figure 5: Effect of the application of EAST on the LLM agent's actions and thoughts. In contrast to varying the token-level sampling temperature, EAST significantly changes the action entropy for a wide range of multipliers before invalidating a model's completions (left), and affects the agent's subjective uncertainty, steering its thoughts towards more explorative behavior given the same starting situation (right).
  • ...and 9 more figures