Global Reinforcement Learning: Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods
Riccardo De Santi, Manish Prajapat, Andreas Krause
TL;DR
This work addresses the limitations of additive state-based rewards in reinforcement learning by introducing Global Reinforcement Learning (GRL), where rewards are defined over entire trajectories via a global set function $F:2^{\mathcal{S}\times\mathcal{T}}\to\mathbb{R}$. It develops a meta-algorithm that linearizes $F$ with tight modular lower bounds and solves a sequence of standard MDPs (Global Trajectory Optimization, GTO, and Global Policy Optimization, GPO), yielding curvature-based approximation guarantees tied to submodular/supermodular/BP structures. The authors prove hardness results for GRL, and demonstrate effectiveness across tasks like D-optimal design, diverse-synergy trajectory selection, and safe state coverage on grid worlds, highlighting improved exploration, design quality, and safety trade-offs. Overall, GRL provides a principled way to model and optimize non-additive, interaction-rich objectives in finite-horizon decision-making, with practical impact for experiments design, exploration, imitation learning, and risk-aware planning.
Abstract
In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design, exploration, imitation learning, and risk-averse RL to name a few. This is due to the fact that additive objectives disregard interactions between states that are crucial for certain tasks. To tackle this problem, we introduce Global RL (GRL), where rewards are globally defined over trajectories instead of locally over states. Global rewards can capture negative interactions among states, e.g., in exploration, via submodularity, positive interactions, e.g., synergetic effects, via supermodularity, while mixed interactions via combinations of them. By exploiting ideas from submodular optimization, we propose a novel algorithmic scheme that converts any GRL problem to a sequence of classic RL problems and solves it efficiently with curvature-dependent approximation guarantees. We also provide hardness of approximation results and empirically demonstrate the effectiveness of our method on several GRL instances.
