Table of Contents
Fetching ...

Lower bounds for ranking-based pivot rules

Yann Disser, Georg Loho, Matthew Maat, Nils Mosis

TL;DR

This paper advances the understanding of pivot rules by proposing a unified framework that classifies rules according to the input information they use, with a focus on ranking-based rules. It proves a superpolynomial lower bound for strategy improvement on sink parity games and a subexponential lower bound for policy iteration on Markov decision processes, with these results extending to the simplex method for linear programming. The core approach combines reductions between strategy iteration, parity games, and LP pivoting, using adversarial gadgets to force long sequences of improvements under memory- and information-constrained rules. These findings suggest intrinsic inefficiencies for a broad class of combinatorial pivot rules and highlight the need to explore alternative information regimes or algorithmic paradigms beyond ranking-based approaches.

Abstract

The existence of a polynomial pivot rule for the simplex method for linear programming, policy iteration for Markov decision processes, and strategy improvement for parity games each are prominent open problems in their respective fields. While numerous natural candidates for efficient rules have been eliminated, all existing lower bound constructions are tailored to individual or small sets of pivot rules. We introduce a unified framework for formalizing classes of rules according to the information about the input that they rely on. Within this framework, we show lower bounds for \emph{ranking-based} classes of rules that base their decisions on orderings of the improving pivot steps induced by the underlying data. Our first result is a superpolynomial lower bound for strategy improvement, obtained via a family of sink parity games, which applies to memory-based generalizations of Bland's rule that only access the input by comparing the ranks of improving edges in some global order. Our second result is a subexponential lower bound for policy iteration, obtained via a family of Markov decision processes, which applies to memoryless rules that only access the input by comparing improving actions according to their ranks in a global order, their reduced costs, and the associated improvements in objective value. Both results carry over to the simplex method for linear programming.

Lower bounds for ranking-based pivot rules

TL;DR

This paper advances the understanding of pivot rules by proposing a unified framework that classifies rules according to the input information they use, with a focus on ranking-based rules. It proves a superpolynomial lower bound for strategy improvement on sink parity games and a subexponential lower bound for policy iteration on Markov decision processes, with these results extending to the simplex method for linear programming. The core approach combines reductions between strategy iteration, parity games, and LP pivoting, using adversarial gadgets to force long sequences of improvements under memory- and information-constrained rules. These findings suggest intrinsic inefficiencies for a broad class of combinatorial pivot rules and highlight the need to explore alternative information regimes or algorithmic paradigms beyond ranking-based approaches.

Abstract

The existence of a polynomial pivot rule for the simplex method for linear programming, policy iteration for Markov decision processes, and strategy improvement for parity games each are prominent open problems in their respective fields. While numerous natural candidates for efficient rules have been eliminated, all existing lower bound constructions are tailored to individual or small sets of pivot rules. We introduce a unified framework for formalizing classes of rules according to the information about the input that they rely on. Within this framework, we show lower bounds for \emph{ranking-based} classes of rules that base their decisions on orderings of the improving pivot steps induced by the underlying data. Our first result is a superpolynomial lower bound for strategy improvement, obtained via a family of sink parity games, which applies to memory-based generalizations of Bland's rule that only access the input by comparing the ranks of improving edges in some global order. Our second result is a subexponential lower bound for policy iteration, obtained via a family of Markov decision processes, which applies to memoryless rules that only access the input by comparing improving actions according to their ranks in a global order, their reduced costs, and the associated improvements in objective value. Both results carry over to the simplex method for linear programming.

Paper Structure

This paper contains 5 sections, 7 theorems, 8 equations, 1 figure, 1 algorithm.

Key Result

Theorem 1

For every deterministic pivot rule that uses $o(N/\log (N))$ memory states and bases decisions solely on the ordering of improving edges/variables by global index, the strategy improvement and the simplex algorithm take $N^{\omega(1)}$ iterations in the worst-case, where $N$ is the input size.

Figures (1)

  • Figure 1: Binary counter parity game $G_n$, adapted from bjorklund_combinatorial_2007. Circles represent player 0 nodes and the sink, and squares represent player 1 nodes. Priorities are shown below the names of the nodes. The initial strategy $\sigma_0$ is shown in bold, and the Bland numbers are given on the edges.

Theorems & Definitions (14)

  • Theorem 1: Informal
  • Theorem 2: Informal
  • Theorem 3: Friedmann2011ExponentialPrograms
  • Theorem 4: maat2025strategyimprovementsimplexalgorithm
  • Definition 5
  • Example 6
  • Example 7
  • Definition 8
  • Definition 9
  • Definition 11
  • ...and 4 more