SPRIG: Stackelberg Perception-Reinforcement Learning with Internal Game Dynamics
Fernando Martinez-Lopez, Juntao Chen, Yingdong Lu
TL;DR
The paper addresses the challenge of coordinating perception and decision-making in RL with high-dimensional sensory inputs. It proposes SPRIG, a cooperative Stackelberg framework in which the perception module acts as leader and the policy module as follower, augmented by a perception-cost term and a two-stage PPO-based training procedure. The authors establish a Stackelberg-MDP formulation and a Stackelberg-Bellman operator with contraction guarantees, ensuring a unique fixed point and convergence. Experiments on BeamRider show SPRIG improves final performance by about 30% over PPO (850 vs 650), with faster early learning and robust exploration dynamics. This work demonstrates that explicit game-theoretic interaction between modules can enhance single-agent RL by improving modular coordination and providing theoretical convergence guarantees.
Abstract
Deep reinforcement learning agents often face challenges to effectively coordinate perception and decision-making components, particularly in environments with high-dimensional sensory inputs where feature relevance varies. This work introduces SPRIG (Stackelberg Perception-Reinforcement learning with Internal Game dynamics), a framework that models the internal perception-policy interaction within a single agent as a cooperative Stackelberg game. In SPRIG, the perception module acts as a leader, strategically processing raw sensory states, while the policy module follows, making decisions based on extracted features. SPRIG provides theoretical guarantees through a modified Bellman operator while preserving the benefits of modern policy optimization. Experimental results on the Atari BeamRider environment demonstrate SPRIG's effectiveness, achieving around 30% higher returns than standard PPO through its game-theoretical balance of feature extraction and decision-making.
