An LP-Based Approach for Bilinear Saddle Point Problem with Instance-dependent Guarantee and Noisy Feedback

Jiashuo Jiang; Mengxiao Zhang

An LP-Based Approach for Bilinear Saddle Point Problem with Instance-dependent Guarantee and Noisy Feedback

Jiashuo Jiang, Mengxiao Zhang

TL;DR

This work tackles the problem of estimating a Nash equilibrium for two-player zero-sum matrix games with noisy feedback by formulating the equilibrium computation as a pair of primal/dual linear programs and solving them via an LP-resolving, two-stage approach. The first stage identifies the NE support using empirical LPs and samples, then the second stage computes the NE restricted to that support through an adaptive, online-resource-allocation-inspired resolving procedure. The authors establish instance-dependent and independent sample complexity guarantees, parameterized by problem constants $\delta$, $\sigma$, and $\sigma_0$, and develop a doubling trick and estimation procedures to remove the need for prior knowledge of these constants. The approach extends to the dual player and yields a practical, theoretically grounded method for NE estimation under noisy bandit feedback, with applications across dueling bandits, market making, pricing, and blockchain security. Overall, the paper provides a principled LP-based framework that achieves improved, instance-aware sample efficiency for NE estimation in high-dimensional, noisy settings and offers practical estimation strategies for latent problem constants.

Abstract

In this work, we study the sample complexity of obtaining a Nash equilibrium (NE) estimate in two-player zero-sum matrix games with noisy feedback. Specifically, we propose a novel algorithm that repeatedly solves linear programs (LPs) to obtain an NE estimate with bias at most $\varepsilon$ with a sample complexity of $O\left(\frac{m_1 m_2}{\varepsilon\min\{δ^2,σ_0^2,σ^3\}} \log\frac{m_1 m_2}{\varepsilon}\right)$ for general $m_1 \times m_2$ game matrices, where $σ$, $σ_0$, $δ$ are some problem-dependent constants. To our knowledge, this is the first instance-dependent sample complexity bound for finding an NE estimate with $\varepsilon$ bias in general-dimension matrix games with noisy feedback and potentially non-unique equilibria. Our algorithm builds on recent advances in online resource allocation and operates in two stages: (1) identifying the support set of an NE, and (2) computing the unique NE restricted to this support. Both stages rely on a careful analysis of LP solutions derived from noisy samples.

An LP-Based Approach for Bilinear Saddle Point Problem with Instance-dependent Guarantee and Noisy Feedback

TL;DR

, and

, and develop a doubling trick and estimation procedures to remove the need for prior knowledge of these constants. The approach extends to the dual player and yields a practical, theoretically grounded method for NE estimation under noisy bandit feedback, with applications across dueling bandits, market making, pricing, and blockchain security. Overall, the paper provides a principled LP-based framework that achieves improved, instance-aware sample efficiency for NE estimation in high-dimensional, noisy settings and offers practical estimation strategies for latent problem constants.

Abstract

with a sample complexity of

for general

game matrices, where

are some problem-dependent constants. To our knowledge, this is the first instance-dependent sample complexity bound for finding an NE estimate with

bias in general-dimension matrix games with noisy feedback and potentially non-unique equilibria. Our algorithm builds on recent advances in online resource allocation and operates in two stages: (1) identifying the support set of an NE, and (2) computing the unique NE restricted to this support. Both stages rely on a careful analysis of LP solutions derived from noisy samples.

Paper Structure (41 sections, 15 theorems, 294 equations, 4 algorithms)

This paper contains 41 sections, 15 theorems, 294 equations, 4 algorithms.

Introduction
Application 1: Dueling bandit and preference learning
Application 2: Market making in financial markets
Application 3: Pricing and bidding competition in revenue management
Application 4: Attacker-defender games in blockchain security
Main Results and Contributions
Other Related Works
Preliminary
Other notations
Reformulation as Linear Programming
Saddle Point Support Identification
Instance-Dependent Constant Sample Complexity Guarantee
$\delta$-Independent $\widetilde{\mathcal{O}}(m/\varepsilon^2)$ Sample Complexity Guarantee
LP-Resolving Based Algorithm
Analysis for the Resolving Procedure
...and 26 more sections

Key Result

Lemma 1

For any optimal solution to the primal LP eqn:primal, denoted by $(\bm{x}^*, \bm{\mu}^*)$, and the corresponding optimal dual solution to LP eqn:dual, denoted by $(\bm{y}^*, \bm{\nu}^*)$, $(\bm{x}^*, \bm{y}^*)$ is an optimal solution to eqn:Formulation.

Theorems & Definitions (24)

Definition 1
Definition 2
Lemma 1
Theorem 1
Definition 3
Definition 4
Theorem 2
Definition 5
Proposition 1
Theorem 3
...and 14 more

An LP-Based Approach for Bilinear Saddle Point Problem with Instance-dependent Guarantee and Noisy Feedback

TL;DR

Abstract

An LP-Based Approach for Bilinear Saddle Point Problem with Instance-dependent Guarantee and Noisy Feedback

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (24)