Table of Contents
Fetching ...

Active Legibility in Multiagent Reinforcement Learning

Yanyu Liu, Yinghui Pan, Yifeng Zeng, Biyang Ma, Doshi Prashant

TL;DR

This work proposes a multiagent active legibility framework that allows agents to conduct legible actions so as to help others optimise their behaviors and demonstrates that the new framework is more efficient and costs less training time compared to several multiagent reinforcement learning algorithms.

Abstract

A multiagent sequential decision problem has been seen in many critical applications including urban transportation, autonomous driving cars, military operations, etc. Its widely known solution, namely multiagent reinforcement learning, has evolved tremendously in recent years. Among them, the solution paradigm of modeling other agents attracts our interest, which is different from traditional value decomposition or communication mechanisms. It enables agents to understand and anticipate others' behaviors and facilitates their collaboration. Inspired by recent research on the legibility that allows agents to reveal their intentions through their behavior, we propose a multiagent active legibility framework to improve their performance. The legibility-oriented framework allows agents to conduct legible actions so as to help others optimise their behaviors. In addition, we design a series of problem domains that emulate a common scenario and best characterize the legibility in multiagent reinforcement learning. The experimental results demonstrate that the new framework is more efficient and costs less training time compared to several multiagent reinforcement learning algorithms.

Active Legibility in Multiagent Reinforcement Learning

TL;DR

This work proposes a multiagent active legibility framework that allows agents to conduct legible actions so as to help others optimise their behaviors and demonstrates that the new framework is more efficient and costs less training time compared to several multiagent reinforcement learning algorithms.

Abstract

A multiagent sequential decision problem has been seen in many critical applications including urban transportation, autonomous driving cars, military operations, etc. Its widely known solution, namely multiagent reinforcement learning, has evolved tremendously in recent years. Among them, the solution paradigm of modeling other agents attracts our interest, which is different from traditional value decomposition or communication mechanisms. It enables agents to understand and anticipate others' behaviors and facilitates their collaboration. Inspired by recent research on the legibility that allows agents to reveal their intentions through their behavior, we propose a multiagent active legibility framework to improve their performance. The legibility-oriented framework allows agents to conduct legible actions so as to help others optimise their behaviors. In addition, we design a series of problem domains that emulate a common scenario and best characterize the legibility in multiagent reinforcement learning. The experimental results demonstrate that the new framework is more efficient and costs less training time compared to several multiagent reinforcement learning algorithms.

Paper Structure

This paper contains 21 sections, 1 theorem, 17 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Let any agent $A^i$ with a target sub-goal $g^i$ be given. The necessary condition for applying the shaping reward by MAAL algorithm to agent $A^i$ without compromising the completeness of the policy as defined in Definition def:complete is: at any two time $t_1$ and $t_2$, when the agent $A^i$ is i

Figures (10)

  • Figure 1: Two trajectories are executed by the two agents (one leader and its follower) who aim to complete the grasping task
  • Figure 2: The new framework of Multiagent Active Legibility (MAAL) is presented from the single-agent perspective ($Agent^i$).
  • Figure 3: An example of Reward Shaping in MDP
  • Figure 4: The Lead-Follow Maze domain: In a $10\times 16$ maze, there are two independent agents, the leader and the follower. Both of them can observe each other's actions and trajectories. There are four exits distributed around the maze, as well as several walls. At the beginning of each game, the leader is assigned a target exit, which is unknown to the follower. The objective of the game is for both agents to reach their respective target exits simultaneously, thereby ending the game and achieving victory. Therefore, the follower needs to observe the leader's trajectory, infer the true target exit, and hurry to it. Meanwhile, the leader can also enhance the legibility of its actions and strategies to help the follower identify its target more quickly and accurately.
  • Figure 5: Episode Reward in LFM
  • ...and 5 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 1
  • Proposition 1
  • proof