HypRL: Reinforcement Learning of Control Policies for Hyperproperties
Tzu-Han Hsu, Arshia Rafieioskouei, Borzoo Bonakdarpour
TL;DR
HypRL introduces a specification-guided reinforcement learning framework for multi-agent systems by leveraging HyperLTL hyperproperties to express relational objectives and constraints. It tackles quantifier alternations with Skolemization, and replaces Boolean satisfaction with quantitative robustness as the reward signal, enabling off-the-shelf RL to maximize the probability of satisfying the hyperproperty. A three-step pipeline—Skolemization, robustness-based semantics, and RL training—learns a tuple of policies that jointly satisfy the HyperLTL specification. Empirical results across SRL, DST, PCP, and wildfire scenarios demonstrate that HypRL outperforms baselines and can handle complex, relational objectives that are difficult for traditional reward designs.
Abstract
Reward shaping in multi-agent reinforcement learning (MARL) for complex tasks remains a significant challenge. Existing approaches often fail to find optimal solutions or cannot efficiently handle such tasks. We propose HYPRL, a specification-guided reinforcement learning framework that learns control policies w.r.t. hyperproperties expressed in HyperLTL. Hyperproperties constitute a powerful formalism for specifying objectives and constraints over sets of execution traces across agents. To learn policies that maximize the satisfaction of a HyperLTL formula $φ$, we apply Skolemization to manage quantifier alternations and define quantitative robustness functions to shape rewards over execution traces of a Markov decision process with unknown transitions. A suitable RL algorithm is then used to learn policies that collectively maximize the expected reward and, consequently, increase the probability of satisfying $φ$. We evaluate HYPRL on a diverse set of benchmarks, including safety-aware planning, Deep Sea Treasure, and the Post Correspondence Problem. We also compare with specification-driven baselines to demonstrate the effectiveness and efficiency of HYPRL.
