Table of Contents
Fetching ...

Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information

Jan Corazza, Hadi Partovi Aria, Hyohun Kim, Daniel Neider, Zhe Xu

TL;DR

This work addresses the challenge of decentralized multi-agent reinforcement learning under privacy and communication constraints by extending reward-machine Task specifications with temporal-causal knowledge. It introduces TL-CDs and a causal DFA to integrate temporal causality into Decentralized Q-learning with Projected Reward Machines (DQPRM), relaxing the original decomposition criteria while preserving guarantees. The proposed Causal DQPRM enables both broader task decomposability and faster learning through TL-CD–guided exploration, with formal results showing compatibility between relaxed and strict criteria. Empirical evaluations on Generator and Laboratory tasks demonstrate substantial improvements in sample efficiency and successful decentralized coordination, highlighting the approach's practical impact for scalable, temporally structured multi-agent systems.

Abstract

Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.

Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information

TL;DR

This work addresses the challenge of decentralized multi-agent reinforcement learning under privacy and communication constraints by extending reward-machine Task specifications with temporal-causal knowledge. It introduces TL-CDs and a causal DFA to integrate temporal causality into Decentralized Q-learning with Projected Reward Machines (DQPRM), relaxing the original decomposition criteria while preserving guarantees. The proposed Causal DQPRM enables both broader task decomposability and faster learning through TL-CD–guided exploration, with formal results showing compatibility between relaxed and strict criteria. Empirical evaluations on Generator and Laboratory tasks demonstrate substantial improvements in sample efficiency and successful decentralized coordination, highlighting the approach's practical impact for scalable, temporally structured multi-agent systems.

Abstract

Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.

Paper Structure

This paper contains 19 sections, 5 theorems, 1 equation, 14 figures, 1 table, 2 algorithms.

Key Result

theorem thmcountertheorem

If RM $\mathcal{R}$ and projections $\mathcal{R}_1, \ldots, \mathcal{R}_N$ satisfy the strict decomposition criterion, then $\forall\xi \in \Sigma^*$, $\mathcal{R}(\xi) = 1$ if and only if $\mathcal{R}_i(P_i(\xi)) = 1, \forall i=1,\ldots,N$. Here, $P_i(\xi)$ is the projection of $\xi$ to $\Sigma_i$

Figures (14)

  • Figure 1: Generator Task
  • Figure 2: Projections of the RM from Figure \ref{['fig:case-study-3-team-rm']} along local event sets of agents 1 (Left) and 2 (Right).
  • Figure 3: TL-CD for the Generator Task (Left) and respective Causal DFA (Right)
  • Figure 4: Generator Task study. Aggregated results from $50$ independent runs.
  • Figure 5: Laboratory Task study. Aggregated results from $50$ independent runs.
  • ...and 9 more figures

Theorems & Definitions (14)

  • definition thmcounterdefinition: Event-based Reward Machine
  • definition thmcounterdefinition: RM-MDP
  • definition thmcounterdefinition: Projected Reward Machine
  • theorem thmcountertheorem: Strict Decomposition Criterion
  • theorem thmcountertheorem: Decomposition Viability
  • definition thmcounterdefinition: Deterministic Finite Automaton
  • theorem thmcountertheorem: Relaxed Decomposition Criterion
  • theorem thmcountertheorem: Criterion Compatibility
  • theorem thmcountertheorem: Relaxed Decomposition Viability
  • definition thmcounterdefinition: Parallel Composition of Reward Machines
  • ...and 4 more