Zeroth-Order Stackelberg Control in Combinatorial Congestion Games

Saeed Masiha; Sepehr Elahi; Negar Kiyavash; Patrick Thiran

Zeroth-Order Stackelberg Control in Combinatorial Congestion Games

Saeed Masiha, Sepehr Elahi, Negar Kiyavash, Patrick Thiran

TL;DR

ZO-Stackelberg is proposed, which couples a projection-free Frank--Wolfe equilibrium solver with a zeroth-order outer update, avoiding differentiation through equilibria and achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.

Abstract

We study Stackelberg (leader--follower) tuning of network parameters (tolls, capacities, incentives) in combinatorial congestion games, where selfish users choose discrete routes (or other combinatorial strategies) and settle at a congestion equilibrium. The leader minimizes a system-level objective (e.g., total travel time) evaluated at equilibrium, but this objective is typically nonsmooth because the set of used strategies can change abruptly. We propose ZO-Stackelberg, which couples a projection-free Frank--Wolfe equilibrium solver with a zeroth-order outer update, avoiding differentiation through equilibria. We prove convergence to generalized Goldstein stationary points of the true equilibrium objective, with explicit dependence on the equilibrium approximation error, and analyze subsampled oracles: if an exact minimizer is sampled with probability $κ_m$, then the Frank--Wolfe error decays as $\mathcal{O}(1/(κ_m T))$. We also propose stratified sampling as a practical way to avoid a vanishing $κ_m$ when the strategies that matter most for the Wardrop equilibrium concentrate in a few dominant combinatorial classes (e.g., short paths). Experiments on real-world networks demonstrate that our method achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.

Zeroth-Order Stackelberg Control in Combinatorial Congestion Games

TL;DR

Abstract

, then the Frank--Wolfe error decays as

. We also propose stratified sampling as a practical way to avoid a vanishing

when the strategies that matter most for the Wardrop equilibrium concentrate in a few dominant combinatorial classes (e.g., short paths). Experiments on real-world networks demonstrate that our method achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.

Paper Structure (107 sections, 5 theorems, 116 equations, 4 figures, 2 tables, 2 algorithms)

This paper contains 107 sections, 5 theorems, 116 equations, 4 figures, 2 tables, 2 algorithms.

Introduction
Problem formulation.
Main challenge: the hyper-objective can be nonsmooth.
Prior work: differentiable equilibrium computation.
Our approach: oracle-based optimization of the true (nonsmooth) objective.
Contributions.
Problem Setting
Combinatorial Congestion Games (CCGs)
Cost functions.
Wardrop Equilibrium and Potential Minimization
Bilevel Model of CCG
Lipschitz stability of the equilibrium map.
When does quadratic growth hold?
Nonsmoothness via active-set changes.
Zeroth-Order Algorithm for Stackelberg Control
...and 92 more sections

Key Result

Proposition 1

Let $z \in \Delta^d$ and $y = y(z)$. Under Assumption ass:costs, $z$ is a Wardrop equilibrium if and only if $y =\mathop{\mathrm{arg\,min}}\limits_{y'\in\mathcal{C}}f(y')$.

Figures (4)

Figure 1: Leader objective vs outer iterations for Scenarios 1--3. For subsampled LMOs (US/UL/HL), lighter shades denote smaller sampling budgets $m$ (we use $m\in\{10,100,1000\}$ in Scenario 2 and 3); bands are 99% CIs over 10 runs, while Diff is deterministic.
Figure 2: Final-iterate diagnostics: speedup vs Diff, peak RSS, FW gap, and social cost, for Scenarios 1--3. For subsampling-based variants, lighter shades denote smaller $m$ (same $m$ as in \ref{['fig:tntp-cost']}); points are means and bars are 99% CIs over 10 runs.
Figure 3: TNTP-derived subgraphs used in Scenarios 1--3.
Figure 4: Left: a three-edge network with two $s$--$t$ paths. Right: a ZDD encoding the corresponding strategy family. Root-to-$\top$ paths correspond to feasible strategies, with hi-arcs indicating selected edges.

Theorems & Definitions (15)

Definition 1: Wardrop equilibrium
Proposition 1: Equilibrium $\iff$ potential minimizer
Lemma 1: Lipschitzness of the equilibrium map and hyper-objective
Example 1: Kinks from active-set changes
Remark 1: Exact LMO (standard)
Remark 2: Relation to uniform-inclusion subsampling
Theorem 5.1: Convergence of FW-Equilibrium with subsampled LMO
Theorem 5.2: Convergence of \ref{['alg:zo-outer']} to a GGSP of $\Phi$
Example 2: Many kinks scaling with the number of strategies
proof
...and 5 more

Zeroth-Order Stackelberg Control in Combinatorial Congestion Games

TL;DR

Abstract

Zeroth-Order Stackelberg Control in Combinatorial Congestion Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (15)