Table of Contents
Fetching ...

A Visual Analytics System to Understand Behaviors of Multi Agents in Reinforcement Learning

Changhee Lee, Jeongmin Rhee, DongHwa Shin

TL;DR

The paper presents MARLViz, a visual analytics system for understanding multi-agent reinforcement learning by extracting agent action features via an autoencoder and presenting them through a four-view interface that facilitates cross-agent comparison and interaction analysis. It addresses limitations of playback visualizations in MARL and demonstrates how environment settings correlate with agent behaviors. Through a 216-agent, 72-scenario usage, the approach reveals distinct behavioral and interaction patterns, supporting easier interpretation of complex MARL dynamics. The work highlights practical implications for analyzing MARL strategies and suggests extending the method to broader MARL environments beyond the snake game.

Abstract

Multi-Agent Reinforcement Learning (MARL) is a branch of machine learning in which agents interact and learn optimal policies through trial and error, addressing complex scenarios where multiple agents interact and learn in the same environment at the same time. Analyzing and understanding these complex interactions is challenging, and existing analysis methods are limited in their ability to fully reflect and interpret this complexity. To address these challenges, we provide MARLViz, a visual analytics system for visualizing and analyzing the policies and interactions of agents in MARL environments. The system is designed to visually show the difference in behavior of agents under different environment settings and help users understand complex interaction patterns. In this study, we analyzed agents with similar behaviors and selected scenarios to understand the interactions of the agents, which made it easier to understand the strategies of agents in MARL.

A Visual Analytics System to Understand Behaviors of Multi Agents in Reinforcement Learning

TL;DR

The paper presents MARLViz, a visual analytics system for understanding multi-agent reinforcement learning by extracting agent action features via an autoencoder and presenting them through a four-view interface that facilitates cross-agent comparison and interaction analysis. It addresses limitations of playback visualizations in MARL and demonstrates how environment settings correlate with agent behaviors. Through a 216-agent, 72-scenario usage, the approach reveals distinct behavioral and interaction patterns, supporting easier interpretation of complex MARL dynamics. The work highlights practical implications for analyzing MARL strategies and suggests extending the method to broader MARL environments beyond the snake game.

Abstract

Multi-Agent Reinforcement Learning (MARL) is a branch of machine learning in which agents interact and learn optimal policies through trial and error, addressing complex scenarios where multiple agents interact and learn in the same environment at the same time. Analyzing and understanding these complex interactions is challenging, and existing analysis methods are limited in their ability to fully reflect and interpret this complexity. To address these challenges, we provide MARLViz, a visual analytics system for visualizing and analyzing the policies and interactions of agents in MARL environments. The system is designed to visually show the difference in behavior of agents under different environment settings and help users understand complex interaction patterns. In this study, we analyzed agents with similar behaviors and selected scenarios to understand the interactions of the agents, which made it easier to understand the strategies of agents in MARL.

Paper Structure

This paper contains 5 sections, 2 figures.

Figures (2)

  • Figure 1: The visual interface of MARLViz. (A) The Overview shows the dimensionality-reduced 2D plot of the entire agents' features. (B) The Config View shows the distributions of environment settings for the game modes, the number of agents, and reward functions of the brushed agents. (C) In the Scenario View, each list item contains the percentage of agents' actions and reward types in the scenario trained with a specific environment setting. (D) In the Interaction View, the heatmap shows the agents' moving paths for the selected scenario in the Scenario View. (E) The line chart represents the trend and events of the rewards obtained by the agents in the selected scenario.
  • Figure 2: Visualization of the agents interactions in Scenario View. On the left is a case with a high proportion of "straight" behaviors, high "fruit" rewards, and low "time" rewards, and on the right is a case with a low proportion of "straight" behaviors, low "fruit" rewards, and high "time" rewards.