Table of Contents
Fetching ...

Graph Neural Thompson Sampling

Shuang Wu, Arash A. Amini

TL;DR

This paper tackles online decision-making on graph-structured data by formulating graph action bandits and introducing GNN-TS, a Thompson Sampling framework that uses a Graph Neural Network to estimate mean rewards and graph neural tangent features to quantify uncertainty. By leveraging the Graph Neural Tangent Kernel, the authors define an effective dimension $\tilde{d}$ and prove a sub-linear regret bound $R_T = \tilde{\mathcal{O}}(\sqrt{\tilde{d}T})$ that is independent of the number of graph nodes, ensuring scalability to large graphs. Theoretical results are complemented by experiments on synthetic graphs (ER and RDPG) across multiple reward-generating mechanisms, demonstrating competitive performance and scalability relative to baselines like GNN-UCB and GNN-PE. Overall, GNN-TS combines neural network approximation with principled uncertainty quantification in graph domains, providing strong regret guarantees and practical applicability for graph-structured bandits.

Abstract

We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose \texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $\tilde{\mathcal{O}}((\tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $\tilde{d}$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed \texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems.

Graph Neural Thompson Sampling

TL;DR

This paper tackles online decision-making on graph-structured data by formulating graph action bandits and introducing GNN-TS, a Thompson Sampling framework that uses a Graph Neural Network to estimate mean rewards and graph neural tangent features to quantify uncertainty. By leveraging the Graph Neural Tangent Kernel, the authors define an effective dimension and prove a sub-linear regret bound that is independent of the number of graph nodes, ensuring scalability to large graphs. Theoretical results are complemented by experiments on synthetic graphs (ER and RDPG) across multiple reward-generating mechanisms, demonstrating competitive performance and scalability relative to baselines like GNN-UCB and GNN-PE. Overall, GNN-TS combines neural network approximation with principled uncertainty quantification in graph domains, providing strong regret guarantees and practical applicability for graph-structured bandits.

Abstract

We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose \texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order in the number of interaction rounds, , and a notion of effective dimension , and (2) independent of the number of graph nodes. Empirical results validate that our proposed \texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems.
Paper Structure (36 sections, 24 theorems, 166 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 36 sections, 24 theorems, 166 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Suppose Assumption assumption: bounded rkhs norm,assumption: bounded reward differences and assumption: subgaussian noise hold. For a fixed horizon $T \in \mathbb{N}$, let and learning rate $\eta \leq (\tilde{C} mL+m\lambda)^{-1}$, for some constant $\tilde{C}$. Then, the regret of Algorithm alg:GraphNeuralTS is bounded as for some universal constant $C > 0$.

Figures (6)

  • Figure 1: Regret over horizon $T= 1000$ for Erdös--Rényi random graphs with $p=0.4$ and $N=50$ in the first row and random dot product graphs with $N=50$. Three columns are three types of reward function generation: linear model, Gaussian process with GNTK, Gaussian process with representation kernel. GNN-TS is competitive and robust to different environment settings.
  • Figure 2: Competitive performance of GNN-TS is consistent across different sizes of graph space.
  • Figure 3: Increasing $m$ can improve the performance of GNN-TS and no improvement of using ${\boldsymbol \textnormal{g}}(G_t;{\boldsymbol \theta}_0)$.
  • Figure 4: Random Dot Product Graphs with linear reward.
  • Figure 5: Random Dot Product Graphs with GP and GNTK for reward.
  • ...and 1 more figures

Theorems & Definitions (40)

  • Theorem 4.1
  • Lemma 5.1
  • Lemma 5.2
  • Lemma 5.3: One Step Regret Bound
  • Lemma 5.4: Cumulative Uncertainty Bound
  • proof : Main Proof
  • Lemma A.1: Taylor Approximation of a GNN
  • Lemma A.2
  • Lemma A.3
  • Lemma A.4
  • ...and 30 more