Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis

Qunzhong Wang; Xiangguo Sun; Hong Cheng

Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis

Qunzhong Wang, Xiangguo Sun, Hong Cheng

TL;DR

The paper develops a data-operation perspective on graph prompting, proving that frozen pretrained GNNs can be bridged to downstream tasks via bridge graphs $G_{bri}$ such that $F_{ heta^{*}}(G_{bri}) = C(G)$. It introduces bridge sets and $oldsymbol{ extepsilon}$-extended bridge sets, and derives both single-graph and batch-wise upper bounds on the approximation error, highlighting how model rank and prompt design control performance. Theoretical results are extended from linear models like GCNs to nonlinear architectures such as GATs, and are validated through extensive experiments on synthetic and real data, demonstrating convergence, error behavior, and practical prompt-size guidance. The work provides a principled framework for designing graph prompts that align upstream and downstream objectives efficiently, with implications for scalable, retraining-free deployment of GNNs across diverse tasks.

Abstract

In recent years, graph prompting has emerged as a promising research direction, enabling the learning of additional tokens or subgraphs appended to the original graphs without requiring retraining of pre-trained graph models across various applications. This novel paradigm, shifting from the traditional pretraining and finetuning to pretraining and prompting has shown significant empirical success in simulating graph data operations, with applications ranging from recommendation systems to biological networks and graph transferring. However, despite its potential, the theoretical underpinnings of graph prompting remain underexplored, raising critical questions about its fundamental effectiveness. The lack of rigorous theoretical proof of why and how much it works is more like a dark cloud over the graph prompt area to go further. To fill this gap, this paper introduces a theoretical framework that rigorously analyzes graph prompting from a data operation perspective. Our contributions are threefold: First, we provide a formal guarantee theorem, demonstrating graph prompts capacity to approximate graph transformation operators, effectively linking upstream and downstream tasks. Second, we derive upper bounds on the error of these data operations by graph prompts for a single graph and extend this discussion to batches of graphs, which are common in graph model training. Third, we analyze the distribution of data operation errors, extending our theoretical findings from linear graph models (e.g., GCN) to non-linear graph models (e.g., GAT). Extensive experiments support our theoretical results and confirm the practical implications of these guarantees.

Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis

TL;DR

The paper develops a data-operation perspective on graph prompting, proving that frozen pretrained GNNs can be bridged to downstream tasks via bridge graphs

such that

. It introduces bridge sets and

-extended bridge sets, and derives both single-graph and batch-wise upper bounds on the approximation error, highlighting how model rank and prompt design control performance. Theoretical results are extended from linear models like GCNs to nonlinear architectures such as GATs, and are validated through extensive experiments on synthetic and real data, demonstrating convergence, error behavior, and practical prompt-size guidance. The work provides a principled framework for designing graph prompts that align upstream and downstream objectives efficiently, with implications for scalable, retraining-free deployment of GNNs across diverse tasks.

Abstract

Paper Structure (39 sections, 23 theorems, 100 equations, 9 figures, 3 tables)

This paper contains 39 sections, 23 theorems, 100 equations, 9 figures, 3 tables.

Introduction
Background
Why Graph Prompt Works? A Data Operation Perspective
Perspective from Model Tuning
Perspective from Data Operation
Measuring the Difficulty of Finding Bridge Graphs via Graph Prompts
The Upper Bound of Data Operation Error via Graph Prompt
Upper Bound of the Error on A Single Graph
Extend the Error Bound Discussion to A Batch of Graphs
Value Distribution of the Data Operation Error with Graph Prompt
Extend the Discussion from Linear to Non-linear
Experiments
Experimental Settings
On mapping to $B_G$ with single graph
On mapping to $\epsilon \text{-}B_G$ with single graph
...and 24 more sections

Key Result

Theorem 1

Let $F_{\theta^{*}}$ be a GNN model pre-trained on task $T_{pre}$ with frozen parameters ($\theta^{*}$); let $T_{dow}$ be the downstream task and $C$ is an optimal function to $T_{dow}$. Given any graph $G_{ori}$, $C(G_\text{ori})$ denotes the optimal embedding vector to the downstream task (i.e. ca

Figures (9)

Figure 1: Real $\epsilon$ distribution and fitted curves.
Figure 2: Convergence rate analysis. GCN (left) and GAT (right).
Figure 3: epsilon range analysis
Figure 4: $\epsilon$ range with a simple prompt token
Figure 5: $\epsilon$ range based on multiple graphs analysis
...and 4 more figures

Theorems & Definitions (42)

Theorem 1
Definition 1: Bridge Set and $\epsilon$-extended Bridge Set
Theorem 2
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Theorem 7
Definition 2: Graph Embedding Residual Vector
Theorem 8
...and 32 more

Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis

TL;DR

Abstract

Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (42)