When Empowerment Disempowers
Claire Yang, Maya Cakmak, Max Kleiman-Weiner
TL;DR
This work investigates whether empowerment as a universal, goal-agnostic objective for assistive AI remains safe when multiple humans interact in the environment. The authors introduce Disempower-Grid, a multi-human gridworld test suite, and compare four objective functions—Empowerment, AvE Proxy, Discrete Choice, and Entropic Choice—within a single-principal multi-human assistance game (SP-MHAG). Across diverse layouts, they find that optimizing for one user’s empowerment systematically disempowers bystanders, revealing a fundamental alignment issue for goal-agnostic AI in multi-agent contexts. Allowing joint empowerment mitigates disempowerment, but at the cost of reduced user reward, suggesting that safety approaches relying solely on empowerment or simple extensions may undermine practical assistance. The results underscore the need for more nuanced objective design that preserves user goals while safeguarding the agency of others in real-world, multi-human environments.
Abstract
Empowerment, a measure of an agent's ability to control its environment, has been proposed as a universal goal-agnostic objective for motivating assistive behavior in AI agents. While multi-human settings like homes and hospitals are promising for AI assistance, prior work on empowerment-based assistance assumes that the agent assists one human in isolation. We introduce an open source multi-human gridworld test suite Disempower-Grid. Using Disempower-Grid, we empirically show that assistive RL agents optimizing for one human's empowerment can significantly reduce another human's environmental influence and rewards - a phenomenon we formalize as disempowerment. We characterize when disempowerment occurs in these environments and show that joint empowerment mitigates disempowerment at the cost of the user's reward. Our work reveals a broader challenge for the AI alignment community: goal-agnostic objectives that seem aligned in single-agent settings can become misaligned in multi-agent contexts.
