Table of Contents
Fetching ...

Learning to Assist Humans without Inferring Rewards

Vivek Myers, Evan Ellis, Sergey Levine, Benjamin Eysenbach, Anca Dragan

TL;DR

This work formally proves that contrastive successor representations estimate a similar notion of empowerment to that studied by prior work and provide a ready-made mechanism for optimizing it, and charts a path for representations to play a critical role in solving assistive problems.

Abstract

Assistive agents should make humans' lives easier. Classically, such assistance is studied through the lens of inverse reinforcement learning, where an assistive agent (e.g., a chatbot, a robot) infers a human's intention and then selects actions to help the human reach that goal. This approach requires inferring intentions, which can be difficult in high-dimensional settings. We build upon prior work that studies assistance through the lens of empowerment: an assistive agent aims to maximize the influence of the human's actions such that they exert a greater control over the environmental outcomes and can solve tasks in fewer steps. We lift the major limitation of prior work in this area--scalability to high-dimensional settings--with contrastive successor representations. We formally prove that these representations estimate a similar notion of empowerment to that studied by prior work and provide a ready-made mechanism for optimizing it. Empirically, our proposed method outperforms prior methods on synthetic benchmarks, and scales to Overcooked, a cooperative game setting. Theoretically, our work connects ideas from information theory, neuroscience, and reinforcement learning, and charts a path for representations to play a critical role in solving assistive problems.

Learning to Assist Humans without Inferring Rewards

TL;DR

This work formally proves that contrastive successor representations estimate a similar notion of empowerment to that studied by prior work and provide a ready-made mechanism for optimizing it, and charts a path for representations to play a critical role in solving assistive problems.

Abstract

Assistive agents should make humans' lives easier. Classically, such assistance is studied through the lens of inverse reinforcement learning, where an assistive agent (e.g., a chatbot, a robot) infers a human's intention and then selects actions to help the human reach that goal. This approach requires inferring intentions, which can be difficult in high-dimensional settings. We build upon prior work that studies assistance through the lens of empowerment: an assistive agent aims to maximize the influence of the human's actions such that they exert a greater control over the environmental outcomes and can solve tasks in fewer steps. We lift the major limitation of prior work in this area--scalability to high-dimensional settings--with contrastive successor representations. We formally prove that these representations estimate a similar notion of empowerment to that studied by prior work and provide a ready-made mechanism for optimizing it. Empirically, our proposed method outperforms prior methods on synthetic benchmarks, and scales to Overcooked, a cooperative game setting. Theoretically, our work connects ideas from information theory, neuroscience, and reinforcement learning, and charts a path for representations to play a critical role in solving assistive problems.

Paper Structure

This paper contains 30 sections, 4 theorems, 38 equations, 8 figures, 1 table, 1 algorithm.

Key Result

theorem 1

Under assm:uniform and assm:ergodic, for sufficiently large $\gamma$ and any $\beta>0$,

Figures (8)

  • Figure 1: We propose an algorithm for training assistive agents to empower human users -- the assistant should take actions that enable human users to visit a wide range of future states, and the human's actions should exert a high degree of influence over the future outcomes. Our algorithm scales to high-dimensional settings, opening the door to building assistive agents that need not directly reason about human intentions.
  • Figure 2: The Information Geometry of Empowerment, illustrating the analysis in \ref{['sec:analysis']}. (Left) For a given state $s_t$ and assistant policy $\pir$, we plot the distribution over future states for 6 choices of the human policy $\pih$. In a 3-state MDP, we can represent each policy as a vector lying on the 2-dimensional probability simplex. We refer to the set of all possible state distributions as the state marginal polytope. (Center) Mutual information corresponds to the distance between the center of the polytope and the vertices that are maximally far away. (Right) Empowerment corresponds to maximizing the size of this polytope. For example, when an assistive agent moves an obstacle out of a human user's way, the human user can spend more time at desired state.
  • Figure 3: We apply our method to the benchmark proposed in prior work du2020ave, visualized in \ref{['fig:gridenv']}. The four subplots show variant tasks of increasing complexity (more blocks), ($\pm 1$ SE). We compare against AvE du2020ave, the Goal Inference baseline from du2020ave which assumes access to a world model, and Reward Inference garg2021iq where we recover the reward from a learned q-value. These prior approaches fail on all except the easiest task, highlighting the importance of scalability.
  • Figure 4: (a) The modified environment from du2020ave scaled to $N=7$ blocks, and (b, c) the two layouts of the Overcooked environment carroll2019utility.
  • Figure 5: In Coordination Ring, our agent learns to wait for the human to add an onion to the pot, and then adds one itself. There is another pot at the top which is nearly full, but the empowerment agent takes actions to maximize the impact of the human's actions, and so follows the lead of the human by filling the empty pot.
  • ...and 3 more figures

Theorems & Definitions (9)

  • remark 1
  • theorem 1
  • lemma 1
  • proof
  • lemma 2
  • proof
  • lemma 3
  • proof
  • proof : Proof of \ref{['thm:empowerment_reward']}