Table of Contents
Fetching ...

Predictive Auxiliary Learning for Belief-based Multi-Agent Systems

Qinwei Huang, Stefan Wang, Simon Khan, Garrett Katz, Qinru Qiu

TL;DR

This work tackles instability and inefficiency in belief-based MARL under partial observability by introducing BEPAL-MAS, which augments agents with a belief decoder and auxiliary predictive tasks to anticipate unobservable information such as teammates' rewards and motions. The method combines a Graph Attention Encoder, LSTM-based hidden state updates, and a centralized loss L = L_{RL} + \lambda L_{aux}, where $L_{aux}$ consists of MSE terms predicting $\overline{b^t}$ and $\overline{p^t}$. Experimental results on Predator-Prey and Google Research Football show BEPAL improves average performance by about 16% and yields more stable convergence, with ablations confirming the positive impact of each auxiliary task and correlations between auxiliary accuracy and RL performance. The work also demonstrates transferability and manageable computation overhead, suggesting practical applicability for scalable, cooperative MARL in complex, partially observable domains. These findings indicate that belief-based auxiliary supervision can robustly enhance policy learning and coordination in multi-agent systems with imperfect information.

Abstract

The performance of multi-agent reinforcement learning (MARL) in partially observable environments depends on effectively aggregating information from observations, communications, and reward signals. While most existing multi-agent systems primarily rely on rewards as the only feedback for policy training, our research shows that introducing auxiliary predictive tasks can significantly enhance learning efficiency and stability. We propose Belief-based Predictive Auxiliary Learning (BEPAL), a framework that incorporates auxiliary training objectives to support policy optimization. BEPAL follows the centralized training with decentralized execution paradigm. Each agent learns a belief model that predicts unobservable state information, such as other agents' rewards or motion directions, alongside its policy model. By enriching hidden state representations with information that does not directly contribute to immediate reward maximization, this auxiliary learning process stabilizes MARL training and improves overall performance. We evaluate BEPAL in the predator-prey environment and Google Research Football, where it achieves an average improvement of about 16 percent in performance metrics and demonstrates more stable convergence compared to baseline methods.

Predictive Auxiliary Learning for Belief-based Multi-Agent Systems

TL;DR

This work tackles instability and inefficiency in belief-based MARL under partial observability by introducing BEPAL-MAS, which augments agents with a belief decoder and auxiliary predictive tasks to anticipate unobservable information such as teammates' rewards and motions. The method combines a Graph Attention Encoder, LSTM-based hidden state updates, and a centralized loss L = L_{RL} + \lambda L_{aux}, where consists of MSE terms predicting and . Experimental results on Predator-Prey and Google Research Football show BEPAL improves average performance by about 16% and yields more stable convergence, with ablations confirming the positive impact of each auxiliary task and correlations between auxiliary accuracy and RL performance. The work also demonstrates transferability and manageable computation overhead, suggesting practical applicability for scalable, cooperative MARL in complex, partially observable domains. These findings indicate that belief-based auxiliary supervision can robustly enhance policy learning and coordination in multi-agent systems with imperfect information.

Abstract

The performance of multi-agent reinforcement learning (MARL) in partially observable environments depends on effectively aggregating information from observations, communications, and reward signals. While most existing multi-agent systems primarily rely on rewards as the only feedback for policy training, our research shows that introducing auxiliary predictive tasks can significantly enhance learning efficiency and stability. We propose Belief-based Predictive Auxiliary Learning (BEPAL), a framework that incorporates auxiliary training objectives to support policy optimization. BEPAL follows the centralized training with decentralized execution paradigm. Each agent learns a belief model that predicts unobservable state information, such as other agents' rewards or motion directions, alongside its policy model. By enriching hidden state representations with information that does not directly contribute to immediate reward maximization, this auxiliary learning process stabilizes MARL training and improves overall performance. We evaluate BEPAL in the predator-prey environment and Google Research Football, where it achieves an average improvement of about 16 percent in performance metrics and demonstrates more stable convergence compared to baseline methods.

Paper Structure

This paper contains 21 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The BEPAL-MAS architecture.
  • Figure 2: Performance comparison for the Predator-Prey games with obstacles is conducted across maps of different sizes: Small ($12\times{12}$), medium ($16\times{16}$) and large ($20\times{20}$)
  • Figure 3: Game Performance vs. Auxiliary Prediction Accuracy.
  • Figure 4: Visualization of motion prediction generated by agents in different time steps in a $12\times{12}$ map with 10 obstacles. (a-c) Predictions generated by agent 3 at time steps 11, 20, and 21. (d) Prediction generated by agent 1 in time step 20.
  • Figure 5: Visualized Reward Prediction
  • ...and 1 more figures