Table of Contents
Fetching ...

Pure Exploration with Feedback Graphs

Alessio Russo, Yichen Song, Aldo Pacchiano

TL;DR

The paper analyzes pure exploration with stochastic feedback graphs under a fixed-confidence target $\delta$. It derives instance-specific lower bounds via change-of-measure, proves unidentifiability for Bernoulli rewards in the uninformed setting, and characterizes how sample complexity scales with graph structure through quantities like $\alpha(G)$, $\delta(G)$, and $\sigma(G)$. It then presents TaSFG (Track and Stop for Feedback Graphs), an asymptotically optimal algorithm that combines estimation, a certainty-equivalence based sampling rule, and a GLRT-style stopping rule to identify the best arm efficiently, with proven $\delta$-PC guarantees. The approach is validated numerically across diverse graph topologies, showing strong performance relative to regret-minimization baselines and highlighting the impact of feedback topology on exploration efficiency. Overall, the work advances both theory and practice for best-arm identification under partial, graph-structured feedback, linking graph properties to achievable sample complexity and delivering a practical, scalable algorithm.

Abstract

We study the sample complexity of pure exploration in an online learning problem with a feedback graph. This graph dictates the feedback available to the learner, covering scenarios between full-information, pure bandit feedback, and settings with no feedback on the chosen action. While variants of this problem have been investigated for regret minimization, no prior work has addressed the pure exploration setting, which is the focus of our study. We derive an instance-specific lower bound on the sample complexity of learning the best action with fixed confidence, even when the feedback graph is unknown and stochastic, and present unidentifiability results for Bernoulli rewards. Additionally, our findings reveal how the sample complexity scales with key graph-dependent quantities. Lastly, we introduce TaS-FG (Track and Stop for Feedback Graphs), an asymptotically optimal algorithm, and demonstrate its efficiency across different graph configurations.

Pure Exploration with Feedback Graphs

TL;DR

The paper analyzes pure exploration with stochastic feedback graphs under a fixed-confidence target . It derives instance-specific lower bounds via change-of-measure, proves unidentifiability for Bernoulli rewards in the uninformed setting, and characterizes how sample complexity scales with graph structure through quantities like , , and . It then presents TaSFG (Track and Stop for Feedback Graphs), an asymptotically optimal algorithm that combines estimation, a certainty-equivalence based sampling rule, and a GLRT-style stopping rule to identify the best arm efficiently, with proven -PC guarantees. The approach is validated numerically across diverse graph topologies, showing strong performance relative to regret-minimization baselines and highlighting the impact of feedback topology on exploration efficiency. Overall, the work advances both theory and practice for best-arm identification under partial, graph-structured feedback, linking graph properties to achievable sample complexity and delivering a practical, scalable algorithm.

Abstract

We study the sample complexity of pure exploration in an online learning problem with a feedback graph. This graph dictates the feedback available to the learner, covering scenarios between full-information, pure bandit feedback, and settings with no feedback on the chosen action. While variants of this problem have been investigated for regret minimization, no prior work has addressed the pure exploration setting, which is the focus of our study. We derive an instance-specific lower bound on the sample complexity of learning the best action with fixed confidence, even when the feedback graph is unknown and stochastic, and present unidentifiability results for Bernoulli rewards. Additionally, our findings reveal how the sample complexity scales with key graph-dependent quantities. Lastly, we introduce TaS-FG (Track and Stop for Feedback Graphs), an asymptotically optimal algorithm, and demonstrate its efficiency across different graph configurations.

Paper Structure

This paper contains 67 sections, 30 theorems, 105 equations, 14 figures, 1 algorithm.

Key Result

Theorem 1

For any $\delta$-PC algorithm and any model $\nu$ with reward distributions $\{\nu_u\}_{u\in V}$ with continuous support, satisfying assump:nontrivial_problem, we have that where

Figures (14)

  • Figure 1: Examples of feedback graphs. From left to right: (1) bandit feedback; (2) apple tasting; (3) revealing action; (4) ring; (5) loopless clique.
  • Figure 2: Loopy star graph. To each edge is associated an activation probability (obs. that $(x)^+ = \max(x,0)$).
  • Figure 3: Loopy star example with $r=1/4$. The solid lines depict $T^\star(\nu)$ for $q=1$ and $q=1/4$ for different values of $p$. Similarly, on the right axis, the dashed lines show $\|G^\top \omega^\star\|_2$, which indicates the amount of information gathered per time-step.
  • Figure 4: Example of symmetric feedback graph where the solution set $C^\star(\nu)=\mathop{\mathrm{arg\,inf}}\limits_{w\in \Delta(V)} T(w;\nu)$ is not unique for $\mu_a=\mu_c$ and $a^\star(\mu)= b$.
  • Figure 5: Box plots of the normalized sample complexity $\frac{\tau}{T^\star(\nu){\rm kl}(\delta,1-\delta)}$ for $\delta=e^{-7}$ over $100$ seeds. Boxes indicate the interquartile range, while the median and mean values are, respectively, the solid line and the $+$ sign in black.
  • ...and 9 more figures

Theorems & Definitions (65)

  • Definition 1: Graph observability
  • Definition 2: Domination and set independence
  • Definition 3: Graph-dependent quantities
  • Example 2.1
  • Definition 4: Uninformed setting
  • Definition 5: Informed setting
  • Definition 6: $\delta$-PC Algorithm
  • Theorem 1
  • Proposition 1
  • Proposition 2
  • ...and 55 more