Stochastic Shortest Path with Sparse Adversarial Costs
Emmeran Johnson, Alberto Rumi, Ciara Pike-Burke, Patrick Rebeschini
TL;DR
The paper tackles adversarial SSP with sparse costs under full-information feedback and known transitions, showing that prior negative-entropy OMD fails to exploit sparsity while a new family of $\ell_r$-norm regularizers with $r\in(1,2)$ achieves sparsity-adaptive regret. The main results establish an upper bound of $\mathcal{O}(\sqrt{DKT_\star \log MT_\star})$ and a matching lower bound of $\Omega(\sqrt{DKT_\star \log M})$, revealing that the sparsity level $M$ acts as the effective dimension. A parameter-free, sparsity-agnostic variant is provided, along with a lower bound showing that sparsity does not yield similar improvements in the unknown-transition setting, where regret scales polynomially in $SA$. These findings highlight that sparsity can shrink the dependence on the problem dimension from $SA$ to $M$ in known-transition SSP, while clarifying the fundamental limits in broader settings.
Abstract
We study the adversarial Stochastic Shortest Path (SSP) problem with sparse costs under full-information feedback. In the known transition setting, existing bounds based on Online Mirror Descent (OMD) with negative-entropy regularization scale with $\sqrt{\log S A}$, where $SA$ is the size of the state-action space. While we show that this is optimal in the worst-case, this bound fails to capture the benefits of sparsity when only a small number $M \ll SA$ of state-action pairs incur cost. In fact, we also show that the negative-entropy is inherently non-adaptive to sparsity: it provably incurs regret scaling with $\sqrt{\log S}$ on sparse problems. Instead, we propose a family of $\ell_r$-norm regularizers ($r \in (1,2)$) that adapts to the sparsity and achieves regret scaling with $\sqrt{\log M}$ instead of $\sqrt{\log SA}$. We show this is optimal via a matching lower bound, highlighting that $M$ captures the effective dimension of the problem instead of $SA$. Finally, in the unknown transition setting the benefits of sparsity are limited: we prove that even on sparse problems, the minimax regret for any learner scales polynomially with $SA$.
