Table of Contents
Fetching ...

Can Graph Foundation Models Generalize Over Architecture?

Benjamin Gutteridge, Michael Bronstein, Xiaowen Dong

Abstract

Graph foundation models (GFMs) have recently attracted interest due to the promise of graph neural network (GNN) architectures that generalize zero-shot across graphs of arbitrary scales, feature dimensions, and domains. While existing work has demonstrated this ability empirically across diverse real-world benchmarks, these tasks share a crucial hidden limitation: they admit a narrow set of effective GNN architectures. In particular, current domain-agnostic GFMs rely on fixed architectural backbones, implicitly assuming that a single message-passing regime suffices across tasks. In this paper, we argue that architecture adaptivity is a necessary requirement for true GFMs. We show that existing approaches are non-robust to task-dependent architectural attributes and, as a case study, use range as a minimal and measurable axis along which this limitation becomes explicit. With theoretical analysis and controlled synthetic experiments, we demonstrate that fixed-backbone GFMs provably under-reach on tasks whose architectural requirements differ from those seen at training time. To address this issue, we introduce a framework that adapts effective GNN architecture at inference time by discovering and mixing task-specific linear graph operators, enabling zero-shot generalization across tasks with heterogeneous architectural requirements, without retraining. We validate our approach on arbitrary-range synthetic tasks and a suite of real-world benchmarks, demonstrating improved performance and robustness over existing domain-agnostic GFMs.

Can Graph Foundation Models Generalize Over Architecture?

Abstract

Graph foundation models (GFMs) have recently attracted interest due to the promise of graph neural network (GNN) architectures that generalize zero-shot across graphs of arbitrary scales, feature dimensions, and domains. While existing work has demonstrated this ability empirically across diverse real-world benchmarks, these tasks share a crucial hidden limitation: they admit a narrow set of effective GNN architectures. In particular, current domain-agnostic GFMs rely on fixed architectural backbones, implicitly assuming that a single message-passing regime suffices across tasks. In this paper, we argue that architecture adaptivity is a necessary requirement for true GFMs. We show that existing approaches are non-robust to task-dependent architectural attributes and, as a case study, use range as a minimal and measurable axis along which this limitation becomes explicit. With theoretical analysis and controlled synthetic experiments, we demonstrate that fixed-backbone GFMs provably under-reach on tasks whose architectural requirements differ from those seen at training time. To address this issue, we introduce a framework that adapts effective GNN architecture at inference time by discovering and mixing task-specific linear graph operators, enabling zero-shot generalization across tasks with heterogeneous architectural requirements, without retraining. We validate our approach on arbitrary-range synthetic tasks and a suite of real-world benchmarks, demonstrating improved performance and robustness over existing domain-agnostic GFMs.
Paper Structure (48 sections, 5 equations, 5 figures, 25 tables)

This paper contains 48 sections, 5 equations, 5 figures, 25 tables.

Figures (5)

  • Figure 1: Average accuracy $\Delta \%$ on 25 benchmarks for varying basis vs. standard GraphAny.
  • Figure 2: Range of GOBLIN on several benchmarks, $\pm \sigma$ over seeds. Median is over all 25 benchmarks.
  • Figure 3: Range of GOBLIN on synthetic and real-world benchmark tasks ($\pm \sigma$ over 5 seeds).
  • Figure 4: Range of best-performing operator on each task, out of all operators selected by GOBLIN across all 5 seeds.
  • Figure 5: The relationship between task range (as captured by GOBLIN) and GOBLIN performance improvement ($\Delta$ average % accuracy) over non-GOBLIN models (MPNNs, GraphAny and TS-GNN, as in Tables \ref{['tab:benchmark_results']} & \ref{['tab:city']}) for 29 real-world benchmarks.