Graph Neural Thompson Sampling
Shuang Wu, Arash A. Amini
TL;DR
This paper tackles online decision-making on graph-structured data by formulating graph action bandits and introducing GNN-TS, a Thompson Sampling framework that uses a Graph Neural Network to estimate mean rewards and graph neural tangent features to quantify uncertainty. By leveraging the Graph Neural Tangent Kernel, the authors define an effective dimension $\tilde{d}$ and prove a sub-linear regret bound $R_T = \tilde{\mathcal{O}}(\sqrt{\tilde{d}T})$ that is independent of the number of graph nodes, ensuring scalability to large graphs. Theoretical results are complemented by experiments on synthetic graphs (ER and RDPG) across multiple reward-generating mechanisms, demonstrating competitive performance and scalability relative to baselines like GNN-UCB and GNN-PE. Overall, GNN-TS combines neural network approximation with principled uncertainty quantification in graph domains, providing strong regret guarantees and practical applicability for graph-structured bandits.
Abstract
We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose \texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $\tilde{\mathcal{O}}((\tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $\tilde{d}$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed \texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems.
