Graph Neural Thompson Sampling

Shuang Wu; Arash A. Amini

Graph Neural Thompson Sampling

Shuang Wu, Arash A. Amini

TL;DR

This paper tackles online decision-making on graph-structured data by formulating graph action bandits and introducing GNN-TS, a Thompson Sampling framework that uses a Graph Neural Network to estimate mean rewards and graph neural tangent features to quantify uncertainty. By leveraging the Graph Neural Tangent Kernel, the authors define an effective dimension $\tilde{d}$ and prove a sub-linear regret bound $R_T = \tilde{\mathcal{O}}(\sqrt{\tilde{d}T})$ that is independent of the number of graph nodes, ensuring scalability to large graphs. Theoretical results are complemented by experiments on synthetic graphs (ER and RDPG) across multiple reward-generating mechanisms, demonstrating competitive performance and scalability relative to baselines like GNN-UCB and GNN-PE. Overall, GNN-TS combines neural network approximation with principled uncertainty quantification in graph domains, providing strong regret guarantees and practical applicability for graph-structured bandits.

Abstract

We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose \texttt{GNN-TS}, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $\tilde{\mathcal{O}}((\tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $\tilde{d}$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed \texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems.

Graph Neural Thompson Sampling

TL;DR

and prove a sub-linear regret bound

that is independent of the number of graph nodes, ensuring scalability to large graphs. Theoretical results are complemented by experiments on synthetic graphs (ER and RDPG) across multiple reward-generating mechanisms, demonstrating competitive performance and scalability relative to baselines like GNN-UCB and GNN-PE. Overall, GNN-TS combines neural network approximation with principled uncertainty quantification in graph domains, providing strong regret guarantees and practical applicability for graph-structured bandits.

Abstract

in the number of interaction rounds,

, and a notion of effective dimension

, and (2) independent of the number of graph nodes. Empirical results validate that our proposed \texttt{GNN-TS} exhibits competitive performance and scales well on graph action bandit problems.

Paper Structure (36 sections, 24 theorems, 166 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 36 sections, 24 theorems, 166 equations, 6 figures, 1 table, 1 algorithm.

Introduction
Related Works
Problem Formulation and Methodology
Graph Action Bandit Problem
Graph Neural Network Model
Graph Neural Thompson Sampling
Regret Bound for GNN-TS
Proof of the Regret Bound
Estimation Bound (${\mathcal{E}}^{\mu}_t$)
Exploration Bound (${\mathcal{E}}^{\sigma}_t, {\mathcal{E}}^{a}_t$)
Proof of Theorem \ref{['theorem: regret upper bound for TS']}
Experiments
Proof for Lemmas in Regret Analysis
Notations
Proof of Lemma \ref{['lemma: high probability bound for event mu']}
...and 21 more sections

Key Result

Theorem 4.1

Suppose Assumption assumption: bounded rkhs norm,assumption: bounded reward differences and assumption: subgaussian noise hold. For a fixed horizon $T \in \mathbb{N}$, let and learning rate $\eta \leq (\tilde{C} mL+m\lambda)^{-1}$, for some constant $\tilde{C}$. Then, the regret of Algorithm alg:GraphNeuralTS is bounded as for some universal constant $C > 0$.

Figures (6)

Figure 1: Regret over horizon $T= 1000$ for Erdös--Rényi random graphs with $p=0.4$ and $N=50$ in the first row and random dot product graphs with $N=50$. Three columns are three types of reward function generation: linear model, Gaussian process with GNTK, Gaussian process with representation kernel. GNN-TS is competitive and robust to different environment settings.
Figure 2: Competitive performance of GNN-TS is consistent across different sizes of graph space.
Figure 3: Increasing $m$ can improve the performance of GNN-TS and no improvement of using ${\boldsymbol \textnormal{g}}(G_t;{\boldsymbol \theta}_0)$.
Figure 4: Random Dot Product Graphs with linear reward.
Figure 5: Random Dot Product Graphs with GP and GNTK for reward.
...and 1 more figures

Theorems & Definitions (40)

Theorem 4.1
Lemma 5.1
Lemma 5.2
Lemma 5.3: One Step Regret Bound
Lemma 5.4: Cumulative Uncertainty Bound
proof : Main Proof
Lemma A.1: Taylor Approximation of a GNN
Lemma A.2
Lemma A.3
Lemma A.4
...and 30 more

Graph Neural Thompson Sampling

TL;DR

Abstract

Graph Neural Thompson Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (40)