Learning on the Edge: Online Learning with Stochastic Feedback Graphs

Emmanuel Esposito; Federico Fusco; Dirk van der Hoeven; Nicolò Cesa-Bianchi

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

Emmanuel Esposito, Federico Fusco, Dirk van der Hoeven, Nicolò Cesa-Bianchi

TL;DR

This work studies online learning with stochastic feedback graphs where each edge is realized independently with edge-specific probabilities, extending the classical feedback-graph model. It introduces EdgeCatcher, a two-phase algorithm that first learns edge probabilities via RoundRobin and then commits to a deterministic graph to leverage established deterministic-graph methods (via BlockReduction), achieving regret bounded by $\min\{\sqrt{(α^*/ε_s^*)T},\; (δ^*/ε_w^*)^{1/3}T^{2/3}\}$ up to polylog terms. The paper proves matching lower bounds and extends the framework with refined graph-theoretic parameters (weighted independence and domination numbers, plus self-observability) to yield improved bounds in special cases, including scenarios where the entire graph is observed. Overall, it removes strong prior assumptions about the feedback structure and shows that sublinear regret is achievable using only local, stochastic feedback, with practical algorithms under the proposed reductions. The results advance understanding of learning under partial, stochastic feedback and offer guidance for designing robust online learners in networks and sensor systems.

Abstract

The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order $\min\bigl\{\min_{\varepsilon} \sqrt{(α_\varepsilon/\varepsilon) T},\, \min_{\varepsilon} (δ_\varepsilon/\varepsilon)^{1/3} T^{2/3}\bigr\}$ (ignoring logarithmic factors), where $α_{\varepsilon}$ and $δ_{\varepsilon}$ are graph-theoretic quantities measured on the support of the stochastic feedback graph $\mathcal{G}$ with edge probabilities thresholded at $\varepsilon$. Our result, which holds without any preliminary knowledge about $\mathcal{G}$, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

TL;DR

up to polylog terms. The paper proves matching lower bounds and extends the framework with refined graph-theoretic parameters (weighted independence and domination numbers, plus self-observability) to yield improved bounds in special cases, including scenarios where the entire graph is observed. Overall, it removes strong prior assumptions about the feedback structure and shows that sublinear regret is achievable using only local, stochastic feedback, with practical algorithms under the proposed reductions. The results advance understanding of learning under partial, stochastic feedback and offer guidance for designing robust online learners in networks and sensor systems.

Abstract

(ignoring logarithmic factors), where

and

are graph-theoretic quantities measured on the support of the stochastic feedback graph

with edge probabilities thresholded at

. Our result, which holds without any preliminary knowledge about

, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.

Paper Structure (24 sections, 33 theorems, 182 equations, 4 algorithms)

This paper contains 24 sections, 33 theorems, 182 equations, 4 algorithms.

Introduction
Additional related work.
Problem Setting
Block Decomposition Approach
Estimating the Edge Probabilities
Block Decomposition: Reduction to Deterministic Feedback Graph
Explore then Commit to a Graph
Lower Bounds
Refined Graph-Theoretic Parameters
On the Computation of the Optimal Probability Thresholds
Missing Results from Section \ref{['sec:block']}
Proof of Theorem \ref{['thm:round-robin-estimates']}
Proof of Theorem \ref{['thm:blocks-reduction']}
Proof of Corollary \ref{['cor:blocks-reduction']}
Proof of Theorem \ref{['thm:block-result']}
...and 9 more sections

Key Result

Theorem 1

Consider the problem of online learning with an unknown stochastic feedback graph $\mathcal{G}$ on $T$ time steps. If $\mathop{\mathrm{supp}}\nolimits(\mathcal{G}_{\varepsilon})$ is not observable for $\varepsilon = \tilde{\Theta}(K^3/T)$, then any learning algorithm suffers regret linear in $T$. Ot This bound is tight (up to polylog factors).

Theorems & Definitions (59)

Theorem 1: Informal
Definition 1: $\varepsilon$-good approximation
Theorem 1
Theorem 1
Corollary 0
Theorem 1
Theorem 2: Informal
Theorem 3: Informal
Example 1: Faulty bandits
Lemma 1: Informal
...and 49 more

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

TL;DR

Abstract

Learning on the Edge: Online Learning with Stochastic Feedback Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (59)