Table of Contents
Fetching ...

Zeroth-Order Stackelberg Control in Combinatorial Congestion Games

Saeed Masiha, Sepehr Elahi, Negar Kiyavash, Patrick Thiran

TL;DR

ZO-Stackelberg is proposed, which couples a projection-free Frank--Wolfe equilibrium solver with a zeroth-order outer update, avoiding differentiation through equilibria and achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.

Abstract

We study Stackelberg (leader--follower) tuning of network parameters (tolls, capacities, incentives) in combinatorial congestion games, where selfish users choose discrete routes (or other combinatorial strategies) and settle at a congestion equilibrium. The leader minimizes a system-level objective (e.g., total travel time) evaluated at equilibrium, but this objective is typically nonsmooth because the set of used strategies can change abruptly. We propose ZO-Stackelberg, which couples a projection-free Frank--Wolfe equilibrium solver with a zeroth-order outer update, avoiding differentiation through equilibria. We prove convergence to generalized Goldstein stationary points of the true equilibrium objective, with explicit dependence on the equilibrium approximation error, and analyze subsampled oracles: if an exact minimizer is sampled with probability $κ_m$, then the Frank--Wolfe error decays as $\mathcal{O}(1/(κ_m T))$. We also propose stratified sampling as a practical way to avoid a vanishing $κ_m$ when the strategies that matter most for the Wardrop equilibrium concentrate in a few dominant combinatorial classes (e.g., short paths). Experiments on real-world networks demonstrate that our method achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.

Zeroth-Order Stackelberg Control in Combinatorial Congestion Games

TL;DR

ZO-Stackelberg is proposed, which couples a projection-free Frank--Wolfe equilibrium solver with a zeroth-order outer update, avoiding differentiation through equilibria and achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.

Abstract

We study Stackelberg (leader--follower) tuning of network parameters (tolls, capacities, incentives) in combinatorial congestion games, where selfish users choose discrete routes (or other combinatorial strategies) and settle at a congestion equilibrium. The leader minimizes a system-level objective (e.g., total travel time) evaluated at equilibrium, but this objective is typically nonsmooth because the set of used strategies can change abruptly. We propose ZO-Stackelberg, which couples a projection-free Frank--Wolfe equilibrium solver with a zeroth-order outer update, avoiding differentiation through equilibria. We prove convergence to generalized Goldstein stationary points of the true equilibrium objective, with explicit dependence on the equilibrium approximation error, and analyze subsampled oracles: if an exact minimizer is sampled with probability , then the Frank--Wolfe error decays as . We also propose stratified sampling as a practical way to avoid a vanishing when the strategies that matter most for the Wardrop equilibrium concentrate in a few dominant combinatorial classes (e.g., short paths). Experiments on real-world networks demonstrate that our method achieves orders-of-magnitude speedups over a differentiation-based baseline while converging to follower equilibria.
Paper Structure (107 sections, 5 theorems, 116 equations, 4 figures, 2 tables, 2 algorithms)

This paper contains 107 sections, 5 theorems, 116 equations, 4 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

Let $z \in \Delta^d$ and $y = y(z)$. Under Assumption ass:costs, $z$ is a Wardrop equilibrium if and only if $y =\mathop{\mathrm{arg\,min}}\limits_{y'\in\mathcal{C}}f(y')$.

Figures (4)

  • Figure 1: Leader objective vs outer iterations for Scenarios 1--3. For subsampled LMOs (US/UL/HL), lighter shades denote smaller sampling budgets $m$ (we use $m\in\{10,100,1000\}$ in Scenario 2 and 3); bands are 99% CIs over 10 runs, while Diff is deterministic.
  • Figure 2: Final-iterate diagnostics: speedup vs Diff, peak RSS, FW gap, and social cost, for Scenarios 1--3. For subsampling-based variants, lighter shades denote smaller $m$ (same $m$ as in \ref{['fig:tntp-cost']}); points are means and bars are 99% CIs over 10 runs.
  • Figure 3: TNTP-derived subgraphs used in Scenarios 1--3.
  • Figure 4: Left: a three-edge network with two $s$--$t$ paths. Right: a ZDD encoding the corresponding strategy family. Root-to-$\top$ paths correspond to feasible strategies, with hi-arcs indicating selected edges.

Theorems & Definitions (15)

  • Definition 1: Wardrop equilibrium
  • Proposition 1: Equilibrium $\iff$ potential minimizer
  • Lemma 1: Lipschitzness of the equilibrium map and hyper-objective
  • Example 1: Kinks from active-set changes
  • Remark 1: Exact LMO (standard)
  • Remark 2: Relation to uniform-inclusion subsampling
  • Theorem 5.1: Convergence of FW-Equilibrium with subsampled LMO
  • Theorem 5.2: Convergence of \ref{['alg:zo-outer']} to a GGSP of $\Phi$
  • Example 2: Many kinks scaling with the number of strategies
  • proof
  • ...and 5 more