Lower bounds for ranking-based pivot rules
Yann Disser, Georg Loho, Matthew Maat, Nils Mosis
TL;DR
This paper advances the understanding of pivot rules by proposing a unified framework that classifies rules according to the input information they use, with a focus on ranking-based rules. It proves a superpolynomial lower bound for strategy improvement on sink parity games and a subexponential lower bound for policy iteration on Markov decision processes, with these results extending to the simplex method for linear programming. The core approach combines reductions between strategy iteration, parity games, and LP pivoting, using adversarial gadgets to force long sequences of improvements under memory- and information-constrained rules. These findings suggest intrinsic inefficiencies for a broad class of combinatorial pivot rules and highlight the need to explore alternative information regimes or algorithmic paradigms beyond ranking-based approaches.
Abstract
The existence of a polynomial pivot rule for the simplex method for linear programming, policy iteration for Markov decision processes, and strategy improvement for parity games each are prominent open problems in their respective fields. While numerous natural candidates for efficient rules have been eliminated, all existing lower bound constructions are tailored to individual or small sets of pivot rules. We introduce a unified framework for formalizing classes of rules according to the information about the input that they rely on. Within this framework, we show lower bounds for \emph{ranking-based} classes of rules that base their decisions on orderings of the improving pivot steps induced by the underlying data. Our first result is a superpolynomial lower bound for strategy improvement, obtained via a family of sink parity games, which applies to memory-based generalizations of Bland's rule that only access the input by comparing the ranks of improving edges in some global order. Our second result is a subexponential lower bound for policy iteration, obtained via a family of Markov decision processes, which applies to memoryless rules that only access the input by comparing improving actions according to their ranks in a global order, their reduced costs, and the associated improvements in objective value. Both results carry over to the simplex method for linear programming.
