Table of Contents
Fetching ...

When Empowerment Disempowers

Claire Yang, Maya Cakmak, Max Kleiman-Weiner

TL;DR

This work investigates whether empowerment as a universal, goal-agnostic objective for assistive AI remains safe when multiple humans interact in the environment. The authors introduce Disempower-Grid, a multi-human gridworld test suite, and compare four objective functions—Empowerment, AvE Proxy, Discrete Choice, and Entropic Choice—within a single-principal multi-human assistance game (SP-MHAG). Across diverse layouts, they find that optimizing for one user’s empowerment systematically disempowers bystanders, revealing a fundamental alignment issue for goal-agnostic AI in multi-agent contexts. Allowing joint empowerment mitigates disempowerment, but at the cost of reduced user reward, suggesting that safety approaches relying solely on empowerment or simple extensions may undermine practical assistance. The results underscore the need for more nuanced objective design that preserves user goals while safeguarding the agency of others in real-world, multi-human environments.

Abstract

Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We introduce an open source multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards - a phenomenon we formalize as disempowerment. We characterize when disempowerment occurs in these environments and show that joint empowerment mitigates disempowerment at the cost of the user's reward. Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.

When Empowerment Disempowers

TL;DR

This work investigates whether empowerment as a universal, goal-agnostic objective for assistive AI remains safe when multiple humans interact in the environment. The authors introduce Disempower-Grid, a multi-human gridworld test suite, and compare four objective functions—Empowerment, AvE Proxy, Discrete Choice, and Entropic Choice—within a single-principal multi-human assistance game (SP-MHAG). Across diverse layouts, they find that optimizing for one user’s empowerment systematically disempowers bystanders, revealing a fundamental alignment issue for goal-agnostic AI in multi-agent contexts. Allowing joint empowerment mitigates disempowerment, but at the cost of reduced user reward, suggesting that safety approaches relying solely on empowerment or simple extensions may undermine practical assistance. The results underscore the need for more nuanced objective design that preserves user goals while safeguarding the agency of others in real-world, multi-human environments.

Abstract

Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We introduce an open source multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards - a phenomenon we formalize as disempowerment. We characterize when disempowerment occurs in these environments and show that joint empowerment mitigates disempowerment at the cost of the user's reward. Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.

Paper Structure

This paper contains 28 sections, 8 equations, 8 figures.

Figures (8)

  • Figure 1: Left: Examples from our test suite Disempower-Grid. The assistant aims to empower the user through a goal-agnostic objective. Differing assistance strategies may influence the optionality of a bystander (green). The left shows an example where the assistant enables both the user and the bystander to reach more states, including the goal. The right shows an example where the assistant inhibits the bystander while helping the user. Right: Sample trajectory showing that four goal-agnostic objectives used for training an assistive RL agent all increase the user's influence/choice while decreasing it for the bystander. See Section \ref{['section:empowerment_prelim']} for details on goal-agnostic objectives.
  • Figure 2: Our experiments demonstrate disempowerment on these four example grids from Disempower-Grid. The user (green) and the bystander (purple) are both rewarded for reaching the star after touching the key. The task is not competitive, and both agents can occupy the star square simultaneously. The user and bystander move in cardinal directions, cannot move through each other, and cannot move the blocks (orange) or the walls (black). Left two grids: when the assistant is embodied, the assistant can move in cardinal directions and can move adjacent blocks by pushing/pulling, or only pushing. The user and bystander cannot move through the assistant when embodied. Each human must go to the key and then the goal position (star) in order to receive their independent reward. Right two grids: when the assistant is non-embodied, it can move any of the blocks, or freeze the bystander in place for 4 timesteps. Each human only needs to go to the goal position (star) in order to receive their independent reward.
  • Figure 3: Assistant disempowers bystander in the Push/Pull Adjacent environment example grid. Left: An example grid where the assistant (robot) must push/pull the boxes (orange dotted) to empower the user (purple). Center/Right: The bystander (green) is disempowered by the assistant's actions. The average empowerment and average reward of the user and bystander across the assistant's training in grid from Disempower-Grid shown on the left. Each trace is averaged over five runs. The error bands show the standard deviation. Empowerment and reward levels are compared against an assistant with a Random objective (green dotted line). Subsequent figures follow the same format: example environment (left), empowerment trajectories (center), and reward trajectories (right).
  • Figure 4: Assistant disempowers bystander in the Push Adjacent environment example grid, despite constrained capabilities.
  • Figure 5: Non-embodied assistant disempowers bystander in the Move Any environment example grid.
  • ...and 3 more figures