Table of Contents
Fetching ...

On Dynamic Programming Theory for Leader-Follower Stochastic Games

Jilles Steeve Dibangoye, Thibaut Le Marre, Ocan Sankur, François Schwarzentruber

TL;DR

This work addresses planning in leader–follower general-sum stochastic games where the follower best-responds to a committed leader policy, yielding SSE. It develops a DP framework that operates on credible sets-state abstractions to capture all rational follower responses and reduces LF-GSSGs to a lossless MDP over these sets. It proves NP-hardness for computing an optimal memoryless deterministic leader policy and introduces $\varepsilon$-optimal DP algorithms with provable guarantees on leader exploitability. Empirical results on standard mixed-motive benchmarks (security games, resource allocation, adversarial planning) show improved leader value and scalable runtimes compared with existing methods.

Abstract

Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) framework that applies Bellman recursion over credible sets-state abstractions formally representing all rational follower best responses under partial leader commitments-to compute SSEs. We first prove that any LF-GSSG admits a lossless reduction to a Markov decision process (MDP) over credible sets. We further establish that synthesising an optimal memoryless deterministic leader policy is NP-hard, motivating the development of ε-optimal DP algorithms with provable guarantees on leader exploitability. Experiments on standard mixed-motive benchmarks-including security games, resource allocation, and adversarial planning-demonstrate empirical gains in leader value and runtime scalability over state-of-the-art methods.

On Dynamic Programming Theory for Leader-Follower Stochastic Games

TL;DR

This work addresses planning in leader–follower general-sum stochastic games where the follower best-responds to a committed leader policy, yielding SSE. It develops a DP framework that operates on credible sets-state abstractions to capture all rational follower responses and reduces LF-GSSGs to a lossless MDP over these sets. It proves NP-hardness for computing an optimal memoryless deterministic leader policy and introduces -optimal DP algorithms with provable guarantees on leader exploitability. Empirical results on standard mixed-motive benchmarks (security games, resource allocation, adversarial planning) show improved leader value and scalable runtimes compared with existing methods.

Abstract

Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yielding a strong Stackelberg equilibrium (SSE) with leader-favourable tie-breaking. This paper introduces a dynamic programming (DP) framework that applies Bellman recursion over credible sets-state abstractions formally representing all rational follower best responses under partial leader commitments-to compute SSEs. We first prove that any LF-GSSG admits a lossless reduction to a Markov decision process (MDP) over credible sets. We further establish that synthesising an optimal memoryless deterministic leader policy is NP-hard, motivating the development of ε-optimal DP algorithms with provable guarantees on leader exploitability. Experiments on standard mixed-motive benchmarks-including security games, resource allocation, and adversarial planning-demonstrate empirical gains in leader value and runtime scalability over state-of-the-art methods.

Paper Structure

This paper contains 1 section, 1 figure.

Table of Contents

  1. Introduction