Table of Contents
Fetching ...

Cooperative Online Learning with Feedback Graphs

Nicolò Cesa-Bianchi, Tommaso R. Cesari, Riccardo Della Vecchia

TL;DR

The paper tackles cooperative online learning with feedback graphs and stochastic activations, introducing Exp3-α^2 as a distributed algorithm under an oblivious network interface. It proves a network-regret bound that scales with the independence number of the strong product, α(N^n ⊠ F), and the total activation Q: $R_T/Q = O\left(\sqrt{ \log(K) ( n+1 + α( N^{n} ⊠ F ) / Q ) T }\right)$, plus a matching instance-dependent lower bound $Ω(\sqrt{ Q α(N^{n} ⊠ F) T })$ for many graph pairs. The results unify and extend prior bounds for both expert and bandit feedback across adversarial and stochastic losses, and they identify the critical role of communication- and feedback-graph structure in determining learning efficiency. Experiments on synthetic data corroborate the theoretical predictions, demonstrating the practical impact of cooperative updates and graph topology on regret.

Abstract

We study the interplay between communication and feedback in a cooperative online learning setting, where a network of communicating agents learn a common sequential decision-making task through a feedback graph. We bound the network regret in terms of the independence number of the strong product between the communication network and the feedback graph. Our analysis recovers as special cases many previously known bounds for cooperative online learning with expert or bandit feedback. We also prove an instance-based lower bound, demonstrating that our positive results are not improvable except in pathological cases. Experiments on synthetic data confirm our theoretical findings.

Cooperative Online Learning with Feedback Graphs

TL;DR

The paper tackles cooperative online learning with feedback graphs and stochastic activations, introducing Exp3-α^2 as a distributed algorithm under an oblivious network interface. It proves a network-regret bound that scales with the independence number of the strong product, α(N^n ⊠ F), and the total activation Q: , plus a matching instance-dependent lower bound for many graph pairs. The results unify and extend prior bounds for both expert and bandit feedback across adversarial and stochastic losses, and they identify the critical role of communication- and feedback-graph structure in determining learning efficiency. Experiments on synthetic data corroborate the theoretical predictions, demonstrating the practical impact of cooperative updates and graph topology on regret.

Abstract

We study the interplay between communication and feedback in a cooperative online learning setting, where a network of communicating agents learn a common sequential decision-making task through a feedback graph. We bound the network regret in terms of the independence number of the strong product between the communication network and the feedback graph. Our analysis recovers as special cases many previously known bounds for cooperative online learning with expert or bandit feedback. We also prove an instance-based lower bound, demonstrating that our positive results are not improvable except in pathological cases. Experiments on synthetic data confirm our theoretical findings.

Paper Structure

This paper contains 14 sections, 11 theorems, 60 equations, 5 figures, 2 tables.

Key Result

Lemma 1

Let $N=(A,E_{N})$ and $F=(K,E_{F})$ be any two graphs, $n\ge 0$, $\bigl( q(v) \bigr)_{v \in A}$ a set of numbers in $(0,1]$, $Q = \sum_{v\in A}q(v)$, and $\bigl( p(i,v) \bigr)_{i\in K,v\in A}$ a set of numbers in $(0,1]$ such that $\sum_{i\in K}p(i,v)=1$ for all $v\in A$. Then,

Figures (5)

  • Figure 1: The random instances of $N$ (leftmost graphs) and $F$ (rightmost graphs) used in our experiments. The sparse graphs are Erdős–Rényi of parameter $0.2$, the dense graphs are Erdős–Rényi of parameter $0.8$.
  • Figure 2: Average regret of Exp3-$\alpha^2$ (blue dots) against the baseline (red dots). The $X$-axis and the $Y$-axis correspond to the parameters $p_{F}$ and $p_{N}$ of the Erdős–Rényi graph, the $Z$-axis is the average regret $R_T/Q$. The three plots correspond to increasing values (from left to right) of activation probability: $q=0.05$ (leftmost plot), $q=0.5$ (central plot), $q=1$ (rightmost plot).
  • Figure 3: Average regret $R_T/Q$ against $T=1000$ of rounds. Activation probability $q=1$.
  • Figure 4: Average regret $R_T/Q$ against $T=1000$ of rounds. Activation probability $q=0.5$.
  • Figure 5: Average regret $R_T/Q$ against $T=1000$ of rounds. Activation probability $q=0.05$.

Theorems & Definitions (21)

  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Definition 1
  • Lemma 2
  • Theorem 2
  • proof
  • proof
  • Lemma 3
  • ...and 11 more