Adversarial Online Learning with Temporal Feedback Graphs

Khashayar Gatmiry; Jon Schneider

Adversarial Online Learning with Temporal Feedback Graphs

Khashayar Gatmiry, Jon Schneider

TL;DR

This work extends online learning with expert advice to temporal feedback graphs, where the learner's decision at round $t$ can only depend on a subset $S_t$ of past losses. It introduces a novel algorithm that partitions losses across maximal orders (subgraphs) and leverages an upper-bound convex program, with a dual formulation yielding sparsity and efficient implementation via a basis of at most $T$ orders; it also develops two lower-bound schemes, $\mathsf{LB}(\mathcal{S})$ and $\mathsf{ILB}(\mathcal{S})$, and proves a near-tight gap in many settings. For transitive graphs, the authors provide an efficient, implementable algorithm with regret bound $O\left(\mathsf{UB}(\mathcal{S})\sqrt{\log K}\right)$ and show a matching tight lower bound up to a constant factor, thereby establishing the optimal learning rate for this important class. The results unify and extend batched and delayed feedback models under a graph-theoretic view, offering practically efficient methods for structured partial information in online learning.

Abstract

We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a directed "feedback graph" known to the learner). We present a novel learning algorithm for this setting based on a strategy of partitioning the losses across sub-cliques of this graph. We complement this with a lower bound that is tight in many practical settings, and which we conjecture to be within a constant factor of optimal. For the important class of transitive feedback graphs, we prove that this algorithm is efficiently implementable and obtains the optimal regret bound (up to a universal constant).

Adversarial Online Learning with Temporal Feedback Graphs

TL;DR

This work extends online learning with expert advice to temporal feedback graphs, where the learner's decision at round

can only depend on a subset

of past losses. It introduces a novel algorithm that partitions losses across maximal orders (subgraphs) and leverages an upper-bound convex program, with a dual formulation yielding sparsity and efficient implementation via a basis of at most

orders; it also develops two lower-bound schemes,

and

, and proves a near-tight gap in many settings. For transitive graphs, the authors provide an efficient, implementable algorithm with regret bound

and show a matching tight lower bound up to a constant factor, thereby establishing the optimal learning rate for this important class. The results unify and extend batched and delayed feedback models under a graph-theoretic view, offering practically efficient methods for structured partial information in online learning.

Abstract

We study a variant of prediction with expert advice where the learner's action at round

is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time

is provided by a directed "feedback graph" known to the learner). We present a novel learning algorithm for this setting based on a strategy of partitioning the losses across sub-cliques of this graph. We complement this with a lower bound that is tight in many practical settings, and which we conjecture to be within a constant factor of optimal. For the important class of transitive feedback graphs, we prove that this algorithm is efficiently implementable and obtains the optimal regret bound (up to a universal constant).

Paper Structure (29 sections, 20 theorems, 50 equations, 1 algorithm)

This paper contains 29 sections, 20 theorems, 50 equations, 1 algorithm.

Introduction
Our results
Algorithms (Section \ref{['sec:algs']}).
Lower bounds (Section \ref{['sec:lbs']}).
Transitive feedback graphs (Section \ref{['sec:transitive']}).
Related work
Model and Preliminaries
Online learning preliminaries
Temporal feedback graphs
Algorithms
A sub-optimal algorithm
A better algorithm
The dual convex program and an efficient learning algorithm
Lower bounds for online learning with temporal feedback graphs
A (naive yet efficient) lower bound program
...and 14 more sections

Key Result

Lemma 1

Let ${\boldsymbol \ell} = (\ell_1, \ell_2, \dots, \ell_T)$ be a sequence of losses such that each $\ell_t \in [0, \lambda_t]^K$. If we let $\mathcal{A}$ be the Hedge algorithm initialized with learning rate $\eta = O\left(\sqrt{(\log K)/\sum_{t=1}^T \lambda_t^2}\right)$, then

Theorems & Definitions (39)

Lemma 1
proof
Lemma 2
proof
Theorem 1
proof
Theorem 2
Lemma 3
Lemma 4
proof
...and 29 more

Adversarial Online Learning with Temporal Feedback Graphs

TL;DR

Abstract

Adversarial Online Learning with Temporal Feedback Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (39)