Table of Contents
Fetching ...

Episodic Future Thinking Mechanism for Multi-agent Reinforcement Learning

Dongsu Lee, Minhae Kwon

TL;DR

An episodic future thinking (EFT) mechanism for a reinforcement learning (RL) agent, inspired by cognitive processes observed in animals, is introduced and the effect of reward improvement remains valid across societies with different levels of character diversity.

Abstract

Understanding cognitive processes in multi-agent interactions is a primary goal in cognitive science. It can guide the direction of artificial intelligence (AI) research toward social decision-making in multi-agent systems, which includes uncertainty from character heterogeneity. In this paper, we introduce an episodic future thinking (EFT) mechanism for a reinforcement learning (RL) agent, inspired by cognitive processes observed in animals. To enable future thinking functionality, we first develop a multi-character policy that captures diverse characters with an ensemble of heterogeneous policies. Here, the character of an agent is defined as a different weight combination on reward components, representing distinct behavioral preferences. The future thinking agent collects observation-action trajectories of the target agents and uses the pre-trained multi-character policy to infer their characters. Once the character is inferred, the agent predicts the upcoming actions of target agents and simulates the potential future scenario. This capability allows the agent to adaptively select the optimal action, considering the predicted future scenario in multi-agent interactions. To evaluate the proposed mechanism, we consider the multi-agent autonomous driving scenario with diverse driving traits and multiple particle environments. Simulation results demonstrate that the EFT mechanism with accurate character inference leads to a higher reward than existing multi-agent solutions. We also confirm that the effect of reward improvement remains valid across societies with different levels of character diversity.

Episodic Future Thinking Mechanism for Multi-agent Reinforcement Learning

TL;DR

An episodic future thinking (EFT) mechanism for a reinforcement learning (RL) agent, inspired by cognitive processes observed in animals, is introduced and the effect of reward improvement remains valid across societies with different levels of character diversity.

Abstract

Understanding cognitive processes in multi-agent interactions is a primary goal in cognitive science. It can guide the direction of artificial intelligence (AI) research toward social decision-making in multi-agent systems, which includes uncertainty from character heterogeneity. In this paper, we introduce an episodic future thinking (EFT) mechanism for a reinforcement learning (RL) agent, inspired by cognitive processes observed in animals. To enable future thinking functionality, we first develop a multi-character policy that captures diverse characters with an ensemble of heterogeneous policies. Here, the character of an agent is defined as a different weight combination on reward components, representing distinct behavioral preferences. The future thinking agent collects observation-action trajectories of the target agents and uses the pre-trained multi-character policy to infer their characters. Once the character is inferred, the agent predicts the upcoming actions of target agents and simulates the potential future scenario. This capability allows the agent to adaptively select the optimal action, considering the predicted future scenario in multi-agent interactions. To evaluate the proposed mechanism, we consider the multi-agent autonomous driving scenario with diverse driving traits and multiple particle environments. Simulation results demonstrate that the EFT mechanism with accurate character inference leads to a higher reward than existing multi-agent solutions. We also confirm that the effect of reward improvement remains valid across societies with different levels of character diversity.

Paper Structure

This paper contains 42 sections, 19 equations, 16 figures, 2 tables, 3 algorithms.

Figures (16)

  • Figure 1: A block diagram of an agent $i$ with a multi-character policy $\pi(o_{t,i};\mathcal{C})$, where $\mathcal{C}$ is character space. The agent can infer the character $\mathbf c$ of others by using the maximum likelihood estimation. Herein, $K$ means the dimension of character vector $\mathbf c$.
  • Figure 2: Diagram of POMDP with EFT mechanism. The future thinking and action selection modules are included to obtain action from the observation. The solid lines and circles represent the actual event. The dashed ones depict the virtual event in the simulated world of the agent $i$.
  • Figure 3: The performance of the character inference module. A. L1-norm between estimated and true characters over the number of iterations ($T=1000$). B. The number of required iterations for convergence over the length of the observation-action trajectory $T$.
  • Figure 4: The amount of reward enhancement for two EFT approaches by setting without EFT as a baseline (i.e., the reward of other approaches - the reward of without EFT).
  • Figure 5: Character inference accuracy over the standard deviation of trajectory noise. (Accuracy: ACC)
  • ...and 11 more figures