Adversarial Online Learning with Temporal Feedback Graphs
Khashayar Gatmiry, Jon Schneider
TL;DR
This work extends online learning with expert advice to temporal feedback graphs, where the learner's decision at round $t$ can only depend on a subset $S_t$ of past losses. It introduces a novel algorithm that partitions losses across maximal orders (subgraphs) and leverages an upper-bound convex program, with a dual formulation yielding sparsity and efficient implementation via a basis of at most $T$ orders; it also develops two lower-bound schemes, $\mathsf{LB}(\mathcal{S})$ and $\mathsf{ILB}(\mathcal{S})$, and proves a near-tight gap in many settings. For transitive graphs, the authors provide an efficient, implementable algorithm with regret bound $O\left(\mathsf{UB}(\mathcal{S})\sqrt{\log K}\right)$ and show a matching tight lower bound up to a constant factor, thereby establishing the optimal learning rate for this important class. The results unify and extend batched and delayed feedback models under a graph-theoretic view, offering practically efficient methods for structured partial information in online learning.
Abstract
We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a directed "feedback graph" known to the learner). We present a novel learning algorithm for this setting based on a strategy of partitioning the losses across sub-cliques of this graph. We complement this with a lower bound that is tight in many practical settings, and which we conjecture to be within a constant factor of optimal. For the important class of transitive feedback graphs, we prove that this algorithm is efficiently implementable and obtains the optimal regret bound (up to a universal constant).
