Logic-based Task Representation and Reward Shaping in Multiagent Reinforcement Learning
Nishant Doshi
TL;DR
The paper tackles scalable policy synthesis for multi-agent systems under temporal task specifications expressed in Linear Temporal Logic (LTL). It introduces Logically Constrained guided Multiagent Q Learning (LCgMQL), a model-free approach that augments MDP states on-the-fly with automaton states derived from the LTL formula, and uses a progress-based potential reward to guide learning while leveraging an options framework to reduce sample complexity. By translating LTL into a Büchi automaton and composing it with the joint transition system to form a product MDP, the method enables correct-by-design controllers without explicitly learning the transition model. Empirical results in gridworlds show that reward shaping and the use of options significantly accelerate convergence, suggesting practical scalability for centralized multi-agent planning under complex temporal goals.
Abstract
This paper presents an approach for accelerated learning of optimal plans for a given task represented using Linear Temporal Logic (LTL) in multi-agent systems. Given a set of options (temporally abstract actions) available to each agent, we convert the task specification into the corresponding Buchi Automaton and proceed with a model-free approach which collects transition samples and constructs a product Semi Markov Decision Process (SMDP) on-the-fly. Value-based Reinforcement Learning algorithms can then be used to synthesize a correct-by-design controller without learning the underlying transition model of the multi-agent system. The exponential sample complexity due to multiple agents is dealt with using a novel reward shaping approach. We test the proposed algorithm in a deterministic gridworld simulation for different tasks and find that the reward shaping results in significant reduction in convergence times. We also infer that using options becomes increasing more relevant as the state and action space increases in multi-agent systems.
