Table of Contents
Fetching ...

SPRIG: Stackelberg Perception-Reinforcement Learning with Internal Game Dynamics

Fernando Martinez-Lopez, Juntao Chen, Yingdong Lu

TL;DR

The paper addresses the challenge of coordinating perception and decision-making in RL with high-dimensional sensory inputs. It proposes SPRIG, a cooperative Stackelberg framework in which the perception module acts as leader and the policy module as follower, augmented by a perception-cost term and a two-stage PPO-based training procedure. The authors establish a Stackelberg-MDP formulation and a Stackelberg-Bellman operator with contraction guarantees, ensuring a unique fixed point and convergence. Experiments on BeamRider show SPRIG improves final performance by about 30% over PPO (850 vs 650), with faster early learning and robust exploration dynamics. This work demonstrates that explicit game-theoretic interaction between modules can enhance single-agent RL by improving modular coordination and providing theoretical convergence guarantees.

Abstract

Deep reinforcement learning agents often face challenges to effectively coordinate perception and decision-making components, particularly in environments with high-dimensional sensory inputs where feature relevance varies. This work introduces SPRIG (Stackelberg Perception-Reinforcement learning with Internal Game dynamics), a framework that models the internal perception-policy interaction within a single agent as a cooperative Stackelberg game. In SPRIG, the perception module acts as a leader, strategically processing raw sensory states, while the policy module follows, making decisions based on extracted features. SPRIG provides theoretical guarantees through a modified Bellman operator while preserving the benefits of modern policy optimization. Experimental results on the Atari BeamRider environment demonstrate SPRIG's effectiveness, achieving around 30% higher returns than standard PPO through its game-theoretical balance of feature extraction and decision-making.

SPRIG: Stackelberg Perception-Reinforcement Learning with Internal Game Dynamics

TL;DR

The paper addresses the challenge of coordinating perception and decision-making in RL with high-dimensional sensory inputs. It proposes SPRIG, a cooperative Stackelberg framework in which the perception module acts as leader and the policy module as follower, augmented by a perception-cost term and a two-stage PPO-based training procedure. The authors establish a Stackelberg-MDP formulation and a Stackelberg-Bellman operator with contraction guarantees, ensuring a unique fixed point and convergence. Experiments on BeamRider show SPRIG improves final performance by about 30% over PPO (850 vs 650), with faster early learning and robust exploration dynamics. This work demonstrates that explicit game-theoretic interaction between modules can enhance single-agent RL by improving modular coordination and providing theoretical convergence guarantees.

Abstract

Deep reinforcement learning agents often face challenges to effectively coordinate perception and decision-making components, particularly in environments with high-dimensional sensory inputs where feature relevance varies. This work introduces SPRIG (Stackelberg Perception-Reinforcement learning with Internal Game dynamics), a framework that models the internal perception-policy interaction within a single agent as a cooperative Stackelberg game. In SPRIG, the perception module acts as a leader, strategically processing raw sensory states, while the policy module follows, making decisions based on extracted features. SPRIG provides theoretical guarantees through a modified Bellman operator while preserving the benefits of modern policy optimization. Experimental results on the Atari BeamRider environment demonstrate SPRIG's effectiveness, achieving around 30% higher returns than standard PPO through its game-theoretical balance of feature extraction and decision-making.

Paper Structure

This paper contains 16 sections, 13 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: SPRIG architecture overview
  • Figure 2: Return curves for SPRIG and baseline PPO on BeamRider. Results averaged across 5 seeds with shaded regions showing standard deviation.
  • Figure 3: Perception Module ($\theta$)
  • Figure 4: Spatio-Temporal Attention Block