Reinforcement Strategies in General Lotto Games

Keith Paarporn; Rahul Chandan; Mahnoosh Alizadeh; Jason R. Marden

Reinforcement Strategies in General Lotto Games

Keith Paarporn, Rahul Chandan, Mahnoosh Alizadeh, Jason R. Marden

TL;DR

This work analyzes a two-stage General Lotto game where one player can pre-allocate reinforcement resources across battlefields before a final simultaneous allocation of real-time resources. Using backward induction, it yields analytic, closed-form SPE payoffs and shows the pre-allocation aligns with $\boldsymbol{p}^* = \boldsymbol{w}\cdot P$, while real-time resources are at least twice as effective, formalized via an effectiveness ratio. It also extends to cost-aware investment planning and a Stackelberg variant where the follower can respond with its own pre-allocations, revealing threshold-driven, sometimes discontinuous improvements in performance. The findings provide sharp insights into dynamic resource-defense trade-offs and have practical implications for multi-stage, adversarial decision-making in cyber-physical and security contexts.

Abstract

Strategic decisions are often made over multiple periods of time, wherein decisions made earlier impact a competitor's success in later stages. In this paper, we study these dynamics in General Lotto games, a class of models describing the competitive allocation of resources between two opposing players. We propose a two-stage formulation where one of the players has reserved resources that can be strategically pre-allocated across the battlefields in the first stage of the game as reinforcements. The players then simultaneously allocate their remaining real-time resources, which can be randomized, in a decisive final stage. Our main contributions provide complete characterizations of the optimal reinforcement strategies and resulting equilibrium payoffs in these multi-stage General Lotto games. Interestingly, we determine that real-time resources are at least twice as effective as reinforcement resources when considering equilibrium payoffs.

Reinforcement Strategies in General Lotto Games

TL;DR

, while real-time resources are at least twice as effective, formalized via an effectiveness ratio. It also extends to cost-aware investment planning and a Stackelberg variant where the follower can respond with its own pre-allocations, revealing threshold-driven, sometimes discontinuous improvements in performance. The findings provide sharp insights into dynamic resource-defense trade-offs and have practical implications for multi-stage, adversarial decision-making in cyber-physical and security contexts.

Abstract

Paper Structure (16 sections, 9 theorems, 71 equations, 4 figures)

This paper contains 16 sections, 9 theorems, 71 equations, 4 figures.

Introduction
Problem formulation
Equilibrium characterizations
Main results
Proof of Theorem \ref{['thm:equilibrium_characterization']}
Interplay between resource types
The effectiveness ratio
Optimal investment in resources
Two-sided pre-allocations
The impact of responding
Follower's best-response
Proof of Proposition \ref{['prop:stack_equil']}
Conclusion
Proof of Part 2-b
Proof of Lemma \ref{['lem:uB_pB']}
...and 1 more sections

Key Result

Theorem 3.1

Consider the game $\text{GL-P}(P,R_A,R_B,\boldsymbol{w})$. Player $A$'s payoff $\pi^*_A(P,R_A,R_B)$ in a SPE is given as follows: Player $B$'s SPE payoff is given by $\pi^*_B(P,R_A,R_B) = 1 - \pi^*_A(P,R_A,R_B)$. In all instances, player $A$'s SPE pre-allocation is $\boldsymbol{p}^* = \boldsymbol{w}\cdot P$.

Figures (4)

Figure 1: (Left) The two-stage General Lotto game with Pre-allocations (GL-P). Players $A$ and $B$ compete over $n$ battlefields, whose valuations are given by $\{w_b\}_{b=1}^n$. In Stage 1, player $A$ decides how to deploy $P$ pre-allocated resources to the battlefields. Player $B$ observes the deployment. In Stage 2, the players simultaneously decide how to deploy their real-time resources $R_A$ and $R_B$ and final payoffs are determined. (Center) This plot shows the SPE payoff to player $A$ under varying resource endowments (Theorem \ref{['thm:equilibrium_characterization']}). Obtaining more pre-allocated resources improves the payoff with decreasing marginal returns. Here, we have fixed $R_B = 1$. (Right) The characterization of the SPE payoff is broken down into three separate cases in the game's parameters. These are shown as the three regions in this plot, here parameterized by $P$ and $R_A$, which correspond to the items in Theorem \ref{['thm:equilibrium_characterization']}.
Figure 2: (Left) A plot of the effectiveness ratio $E(R_A,R_B)$ (Theorem \ref{['thm:ratio']}), which quantifies the multiplicative factor of pre-allocated resources needed to achieve the same performance as an amount of real-time resources $R_A$. Notably, real-time resources are at least twice as effective as an equivalent amount of pre-allocated resources. (Center) This plot shows a collection of level curves for player $A$'s SPE payoff. A level curve corresponds to a fixed performance level $\Pi$, and any point $(P,R_A)$ on the level curve satisfies $\pi_A^*(P,R_A,R_B) = \Pi$ (Lemma \ref{['lem:level_set']}). (Right) This plot shows player $A$'s optimal investment in pre-allocated resources $P^*$ when it has a per-unit cost of $c_A$ and a fixed monetary budget of $M_A$ to invest in both types of resources (Theorem \ref{['thm:investment']}). Player $A$ invests the remaining $M_A - c_A P^*$ in real-time resources. In these plots, we set $R_B = 1$, and $W = 1$.
Figure 3: This plot illsutrates how to determine the optimal investment $(P^*,R^*_A)\in\mathbb{R}^2_+$ subject to the cost constraint in \ref{['eq:linear_cost_constraint']}. The set of feasible investments $\mathcal{I}(M_A)$ is the line segment connecting $(0,M_A)$ and $(M_A/c_A,0)$. The optimal investment lies on the level curve tangent to this line segment. For example, when $c=0.423$, the optimal investment is $(2.309,0.357)$ (unfilled circle), which gives a performance level of $\Pi = 0.75$. For sufficiently high cost $c_A$, $\mathcal{I}(M_A)$ will not be tangent to any level curve, and the optimal investment is $(0,M_A)$. For example, when $c_A=1.333$, the highest level curve that intersects $\mathcal{I}(M_A)$ is $\Pi = 0.625$, and the optimal investment is $(0,4/3)$ (filled square).
Figure 4: This plot illustrates the Stackelberg equilibrium payoff (red line, Proposition \ref{['prop:stack_equil']}) to player $B$ contrasted with its payoff if it did not have the opportunity to respond with pre-allocated resources, i.e. setting $p_B = 0$ (green dashed line, Theorem \ref{['thm:investment']}). Notably, there is a dramatic improvement in performance when player $B$ is sufficiently budget-rich, $\frac{M_B}{c_B} = \frac{M_A}{c_A}$. In this example, we set $M_A = 0.5$, $c_A = 0.2$, and $c_B = 0.5$. We vary $M_B$ from $0$ to $3$.

Theorems & Definitions (18)

Definition 2.1
Theorem 3.1
Lemma 3.1: Adapted from Vu_EC2021
Lemma 3.2
proof
proof : Proof of Theorem \ref{['thm:equilibrium_characterization']}
Definition 4.1
Theorem 4.1
Lemma 4.1
proof : Proof of Theorem \ref{['thm:ratio']}
...and 8 more

Reinforcement Strategies in General Lotto Games

TL;DR

Abstract

Reinforcement Strategies in General Lotto Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (18)