Markov Chains with Rewinding
Amir Azarmehr, Soheil Behnezhad, Alma Ghafari, Madhu Sudan
TL;DR
This work introduces Markov chains with rewinding, a model where a partially observable Markov chain interacts with an algorithm that can rewind to prior times. Focusing on identifying the initial state, the authors compare adaptive and non-adaptive rewinding, establishing that without efficiency constraints the two powers coincide, but with a polynomial gap emerges when efficiency is measured by query complexity. They develop a polynomially efficient non-adaptive algorithm based on a partition-graph framework and multiplicative path costs, and they prove a near-optimal adaptive strategy that achieves significantly better query complexity in certain constructions. A key contribution is the demonstration that canonical p.o. Markov chains suffice to capture the difficulty of the general problem, via reductions that preserve adaptive and non-adaptive complexity up to polylogarithmic factors. Overall, the results provide a structured toolkit for analyzing rewinding strategies and connect to sublinear-time lower bounds in graph problems, offering a quantitative understanding of when adaptivity helps in rewinding-enabled processes.
Abstract
Motivated by techniques developed in recent progress on lower bounds for sublinear time algorithms (Behnezhad, Roghani and Rubinstein, STOC 2023, FOCS 2023, and STOC 2024) we introduce and study a new class of randomized algorithmic processes that we call Markov Chains with Rewinding. In this setting, an algorithm interacts with a (partially observable) Markovian random evolution by strategically rewinding the Markov chain to previous states. Depending on the application, this may lead the evolution to desired states faster, or allow the agent to efficiently learn or test properties of the underlying Markov chain that may be infeasible or inefficient with passive observation. We study the task of identifying the initial state in a given partially observable Markov chain. Analysis of this question in specific Markov chains is the central ingredient in the above cited works and we aim to systematize the analysis in our work. Our first result is that any pair of states distinguishable with any rewinding strategy can also be distinguished with a non-adaptive rewinding strategy (one whose rewinding choices are determined before observing any outcomes of the chain). Therefore, while rewinding strategies can be shown to be strictly more powerful than passive strategies (those that do not rewind back to previous states), adaptivity does not give additional power to a rewinding strategy in the absence of efficiency considerations. The difference becomes apparent however when we introduce a natural efficiency measure, namely the query complexity (i.e., the number of observations they need to identify distinguishable states). Our second main contribution is to quantify this efficiency gap. We present a non-adaptive rewinding strategy whose query complexity is within a polynomial of that of the optimal (adaptive) strategy, and show that such a polynomial loss is necessary in general.
