Table of Contents
Fetching ...

General Performance Evaluation for Competitive Resource Allocation Games via Unseen Payoff Estimation

N'yoma Diamond, Fabricio Murai

TL;DR

The paper introduces a generalized payoff L_p for competitive resource allocation games to unify performance evaluation across varying feedback regimes, including bandit and semi-bandit settings. It defines two core metrics, Max Payoff and Expected Payoff, and proposes uncertainty-aware estimators—Observable Max Payoff, Supremum Payoff, and Observable Expected Payoff—computed via feasible opponent decision sets and under a Uniform Decision Assumption. Using Colonel Blotto as a case study, it develops a graph-based model (decision graph) and pruning/bounding techniques to efficiently identify feasible opponent decisions and tighten estimates, with proofs and empirical validation showing near-ground-truth accuracy across diverse configurations. The findings enable problem-agnostic evaluation of resource-allocation algorithms in mutually adaptive adversarial settings, with practical implications for cybersecurity, economics, and related domains.

Abstract

Many high-stakes decision-making problems, such as those found within cybersecurity and economics, can be modeled as competitive resource allocation games. In these games, multiple players must allocate limited resources to overcome their opponent(s), while minimizing any induced individual losses. However, existing means of assessing the performance of resource allocation algorithms are highly disparate and problem-dependent. As a result, evaluating such algorithms is unreliable or impossible in many contexts and applications, especially when considering differing levels of feedback. To resolve this problem, we propose a generalized definition of payoff which uses an arbitrary user-provided function. This unifies performance evaluation under all contexts and levels of feedback. Using this definition, we develop metrics for evaluating player performance, and estimators to approximate them under uncertainty (i.e., bandit or semi-bandit feedback). These metrics and their respective estimators provide a problem-agnostic means to contextualize and evaluate algorithm performance. To validate the accuracy of our estimator, we explore the Colonel Blotto ($\mathcal{CB}$) game as an example. To this end, we propose a graph-pruning approach to efficiently identify feasible opponent decisions, which are used in computing our estimation metrics. Using various resource allocation algorithms and game parameters, a suite of $\mathcal{CB}$ games are simulated and used to compute and evaluate the quality of our estimates. These simulations empirically show our approach to be highly accurate at estimating the metrics associated with the unseen outcomes of an opponent's latent behavior.

General Performance Evaluation for Competitive Resource Allocation Games via Unseen Payoff Estimation

TL;DR

The paper introduces a generalized payoff L_p for competitive resource allocation games to unify performance evaluation across varying feedback regimes, including bandit and semi-bandit settings. It defines two core metrics, Max Payoff and Expected Payoff, and proposes uncertainty-aware estimators—Observable Max Payoff, Supremum Payoff, and Observable Expected Payoff—computed via feasible opponent decision sets and under a Uniform Decision Assumption. Using Colonel Blotto as a case study, it develops a graph-based model (decision graph) and pruning/bounding techniques to efficiently identify feasible opponent decisions and tighten estimates, with proofs and empirical validation showing near-ground-truth accuracy across diverse configurations. The findings enable problem-agnostic evaluation of resource-allocation algorithms in mutually adaptive adversarial settings, with practical implications for cybersecurity, economics, and related domains.

Abstract

Many high-stakes decision-making problems, such as those found within cybersecurity and economics, can be modeled as competitive resource allocation games. In these games, multiple players must allocate limited resources to overcome their opponent(s), while minimizing any induced individual losses. However, existing means of assessing the performance of resource allocation algorithms are highly disparate and problem-dependent. As a result, evaluating such algorithms is unreliable or impossible in many contexts and applications, especially when considering differing levels of feedback. To resolve this problem, we propose a generalized definition of payoff which uses an arbitrary user-provided function. This unifies performance evaluation under all contexts and levels of feedback. Using this definition, we develop metrics for evaluating player performance, and estimators to approximate them under uncertainty (i.e., bandit or semi-bandit feedback). These metrics and their respective estimators provide a problem-agnostic means to contextualize and evaluate algorithm performance. To validate the accuracy of our estimator, we explore the Colonel Blotto () game as an example. To this end, we propose a graph-pruning approach to efficiently identify feasible opponent decisions, which are used in computing our estimation metrics. Using various resource allocation algorithms and game parameters, a suite of games are simulated and used to compute and evaluate the quality of our estimates. These simulations empirically show our approach to be highly accurate at estimating the metrics associated with the unseen outcomes of an opponent's latent behavior.
Paper Structure (22 sections, 28 theorems, 30 equations, 2 figures, 14 tables, 2 algorithms)

This paper contains 22 sections, 28 theorems, 30 equations, 2 figures, 14 tables, 2 algorithms.

Key Result

Proposition 1

$N_p = \sum_{\lambda\in\Lambda}\pi_\lambda + \sum_{\omega\in\Omega}\pi_\omega$ for any player $p$.

Figures (2)

  • Figure 1: Decision graph $G_{3,4}$ for a game with $K=3$ battlefields given $N_p=4$ resources. Blue path represents the decision $\langle1,0,3\rangle$; red path represents the decision $\langle4,0,0\rangle$.
  • Figure 2: Pruning opponent decision graph $G^t_{3,4}$ (\ref{['fig:graph_example']}) given $\pi^t = \langle 1, 3, 2 \rangle$, $\mathcal{L}_p^t = \langle 0, 1, 0 \rangle$, and $\delta_p = 0$. Opponent allocation bounds are $\underline{\phi}^t = \langle 1, 0, 2 \rangle$ and $\overline{\phi}^t = \langle 4, 2, 3 \rangle$. \ref{['subfig:bound pruning']} Red and blue edges exceed feasible allocation lower and upper bounds, respectively. \ref{['subfig:dead end pruning']} Red vertices are dead-ends for $i=1$, while blue vertices are dead-ends for $i=2$ after pruning for $i=1$. \ref{['subfig:final decision graph']} The final pruned graph of feasible opponent decisions. Thus there are only 3 feasible decisions: $\langle 1, 0, 3 \rangle$, $\langle 1, 1, 2 \rangle$, and $\langle 2, 0, 2 \rangle$.

Theorems & Definitions (37)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • Lemma 4
  • ...and 27 more