Pure Exploration with Feedback Graphs
Alessio Russo, Yichen Song, Aldo Pacchiano
TL;DR
The paper analyzes pure exploration with stochastic feedback graphs under a fixed-confidence target $\delta$. It derives instance-specific lower bounds via change-of-measure, proves unidentifiability for Bernoulli rewards in the uninformed setting, and characterizes how sample complexity scales with graph structure through quantities like $\alpha(G)$, $\delta(G)$, and $\sigma(G)$. It then presents TaSFG (Track and Stop for Feedback Graphs), an asymptotically optimal algorithm that combines estimation, a certainty-equivalence based sampling rule, and a GLRT-style stopping rule to identify the best arm efficiently, with proven $\delta$-PC guarantees. The approach is validated numerically across diverse graph topologies, showing strong performance relative to regret-minimization baselines and highlighting the impact of feedback topology on exploration efficiency. Overall, the work advances both theory and practice for best-arm identification under partial, graph-structured feedback, linking graph properties to achievable sample complexity and delivering a practical, scalable algorithm.
Abstract
We study the sample complexity of pure exploration in an online learning problem with a feedback graph. This graph dictates the feedback available to the learner, covering scenarios between full-information, pure bandit feedback, and settings with no feedback on the chosen action. While variants of this problem have been investigated for regret minimization, no prior work has addressed the pure exploration setting, which is the focus of our study. We derive an instance-specific lower bound on the sample complexity of learning the best action with fixed confidence, even when the feedback graph is unknown and stochastic, and present unidentifiability results for Bernoulli rewards. Additionally, our findings reveal how the sample complexity scales with key graph-dependent quantities. Lastly, we introduce TaS-FG (Track and Stop for Feedback Graphs), an asymptotically optimal algorithm, and demonstrate its efficiency across different graph configurations.
