Table of Contents
Fetching ...

Stochastic Shortest Path with Sparse Adversarial Costs

Emmeran Johnson, Alberto Rumi, Ciara Pike-Burke, Patrick Rebeschini

TL;DR

The paper tackles adversarial SSP with sparse costs under full-information feedback and known transitions, showing that prior negative-entropy OMD fails to exploit sparsity while a new family of $\ell_r$-norm regularizers with $r\in(1,2)$ achieves sparsity-adaptive regret. The main results establish an upper bound of $\mathcal{O}(\sqrt{DKT_\star \log MT_\star})$ and a matching lower bound of $\Omega(\sqrt{DKT_\star \log M})$, revealing that the sparsity level $M$ acts as the effective dimension. A parameter-free, sparsity-agnostic variant is provided, along with a lower bound showing that sparsity does not yield similar improvements in the unknown-transition setting, where regret scales polynomially in $SA$. These findings highlight that sparsity can shrink the dependence on the problem dimension from $SA$ to $M$ in known-transition SSP, while clarifying the fundamental limits in broader settings.

Abstract

We study the adversarial Stochastic Shortest Path (SSP) problem with sparse costs under full-information feedback. In the known transition setting, existing bounds based on Online Mirror Descent (OMD) with negative-entropy regularization scale with $\sqrt{\log S A}$, where $SA$ is the size of the state-action space. While we show that this is optimal in the worst-case, this bound fails to capture the benefits of sparsity when only a small number $M \ll SA$ of state-action pairs incur cost. In fact, we also show that the negative-entropy is inherently non-adaptive to sparsity: it provably incurs regret scaling with $\sqrt{\log S}$ on sparse problems. Instead, we propose a family of $\ell_r$-norm regularizers ($r \in (1,2)$) that adapts to the sparsity and achieves regret scaling with $\sqrt{\log M}$ instead of $\sqrt{\log SA}$. We show this is optimal via a matching lower bound, highlighting that $M$ captures the effective dimension of the problem instead of $SA$. Finally, in the unknown transition setting the benefits of sparsity are limited: we prove that even on sparse problems, the minimax regret for any learner scales polynomially with $SA$.

Stochastic Shortest Path with Sparse Adversarial Costs

TL;DR

The paper tackles adversarial SSP with sparse costs under full-information feedback and known transitions, showing that prior negative-entropy OMD fails to exploit sparsity while a new family of -norm regularizers with achieves sparsity-adaptive regret. The main results establish an upper bound of and a matching lower bound of , revealing that the sparsity level acts as the effective dimension. A parameter-free, sparsity-agnostic variant is provided, along with a lower bound showing that sparsity does not yield similar improvements in the unknown-transition setting, where regret scales polynomially in . These findings highlight that sparsity can shrink the dependence on the problem dimension from to in known-transition SSP, while clarifying the fundamental limits in broader settings.

Abstract

We study the adversarial Stochastic Shortest Path (SSP) problem with sparse costs under full-information feedback. In the known transition setting, existing bounds based on Online Mirror Descent (OMD) with negative-entropy regularization scale with , where is the size of the state-action space. While we show that this is optimal in the worst-case, this bound fails to capture the benefits of sparsity when only a small number of state-action pairs incur cost. In fact, we also show that the negative-entropy is inherently non-adaptive to sparsity: it provably incurs regret scaling with on sparse problems. Instead, we propose a family of -norm regularizers () that adapts to the sparsity and achieves regret scaling with instead of . We show this is optimal via a matching lower bound, highlighting that captures the effective dimension of the problem instead of . Finally, in the unknown transition setting the benefits of sparsity are limited: we prove that even on sparse problems, the minimax regret for any learner scales polynomially with .

Paper Structure

This paper contains 26 sections, 16 theorems, 84 equations, 5 figures, 2 algorithms.

Key Result

Theorem 3.1

For any $S \geq 6$, there exists an SSP instance with a fixed horizon of $3$, sparsity level $M = 3$, an action space of size $A = 2$ and state space of size $S$ such that the regret of OMD (eq:mirror_descent) with negative-entropy regularization and any step-size $\eta>0$ after $K$ episodes is $\ma

Figures (5)

  • Figure 1: MDP for the reduction to a skewed experts problem with 2 actions: $\mathcal{S} = \bigl\{s_0, s_g, s_1,...,s_N\bigr\}$ ($N=S-2$), $\mathcal{A} = \bigl\{a_1, a_2\bigr\}$. The transitions are given by $p(s_g|s_0, a_1) = 1, p(g|s_g, a) = 1$ for all $a \in \mathcal{A}$, for $i \geq 1$: $p(s_i|s_0,a_2) = 1/N, p(g|s_i, a) = 1$ for all $a \in \mathcal{A}$.
  • Figure 2: Bregman Divergence between a deterministic distribution $x = [0,1]$ and the uniform distribution $y = [1/2,1/2]$ for our regularizer $\psi_p$, squared Euclidean norm $\psi_E$ and negative entropy $H$ for increasing values of $p$.
  • Figure 3: Diagram illustrating MDP construction for the proof of \ref{['thm:failure_neg_ent']}. When an action is not specified for an edge, then both actions give the same transition and cost. If an edge has a number in black, it is a transition probability; if it does not then the transition is deterministic. The costs are given in red. The formal description of the MDP is given above.
  • Figure 4: Diagram illustrating MDP construction for the proof of \ref{['thm:sparse_full_info_LB']}. Details are given below.
  • Figure 5: base case

Theorems & Definitions (28)

  • Theorem 3.1
  • Theorem 4.1
  • proof
  • Remark 4.2
  • Remark 4.3
  • Theorem 4.4
  • Remark 4.5
  • Theorem 4.6
  • Theorem 5.1
  • Theorem A.1
  • ...and 18 more