Stochastic Games with Minimally Bounded Action Costs
David Mguni
TL;DR
This work develops a discrete-time, two-player zero-sum stochastic game with minimally bounded per-action costs, introducing a two-sided impulse-control framework. It proves the existence and uniqueness of a minimax value in Markov strategies and establishes a dynamic programming principle, enabling the equilibrium to be obtained as the limit of Bellman iterations. A novel Q-learning variant is shown to converge almost surely to the game value, providing a practical method to compute minimax equilibria in unknown environments, and the theory extends to linear function approximation and to budget-constrained settings. The framework also encompasses subcases such as impulse-control with stopping and Dynkin games, situating the results within the broader literatures on impulse control and learning in strategic environments. These contributions enhance the design and analysis of strategic interactions under fixed-action costs, with potential applications in economics and finance where transaction costs and menu costs are prevalent.
Abstract
In many multi-player interactions, players incur strictly positive costs each time they execute actions e.g. 'menu costs' or transaction costs in financial systems. Since acting at each available opportunity would accumulate prohibitively large costs, the resulting decision problem is one in which players must make strategic decisions about when to execute actions in addition to their choice of action. This paper analyses a discrete-time stochastic game (SG) in which players face minimally bounded positive costs for each action and influence the system using impulse controls. We prove SGs of two-sided impulse control have a unique value and characterise the saddle point equilibrium in which the players execute actions at strategically chosen times in accordance with Markovian strategies. We prove the game respects a dynamic programming principle and that the Markov perfect equilibrium can be computed as a limit point of a sequence of Bellman operations. We then introduce a new Q-learning variant which we show converges almost surely to the value of the game enabling solutions to be extracted in unknown settings. Lastly, we extend our results to settings with budgetory constraints.
