Table of Contents
Fetching ...

GraphUniverse: Synthetic Graph Generation for Evaluating Inductive Generalization

Louis Van Langendonck, Guillermo Bernárdez, Nina Miolane, Pere Barlet-Ros

TL;DR

GraphUniverse is introduced, a framework for generating entire families of graphs to enable the first systematic evaluation of inductive generalization at scale, and it is found that robustness to distribution shift is highly sensitive not only to model architecture choice but also to the initial graph regime.

Abstract

A fundamental challenge in graph learning is understanding how models generalize to new, unseen graphs. While synthetic benchmarks offer controlled settings for analysis, existing approaches are confined to single-graph, transductive settings where models train and test on the same graph structure. Addressing this gap, we introduce GraphUniverse, a framework for generating entire families of graphs to enable the first systematic evaluation of inductive generalization at scale. Our core innovation is the generation of graphs with persistent semantic communities, ensuring conceptual consistency while allowing fine-grained control over structural properties like homophily and degree distributions. This enables crucial but underexplored robustness tests, such as performance under controlled distribution shifts. Benchmarking a wide range of architectures -- from GNNs to graph transformers and topological architectures -- reveals that strong transductive performance is a poor predictor of inductive generalization. Furthermore, we find that robustness to distribution shift is highly sensitive not only to model architecture choice but also to the initial graph regime (e.g., high vs. low homophily). Beyond benchmarking, GraphUniverse's flexibility and scalability can facilitate the development of robust and truly generalizable architectures. The framework is open-source at https://github.com/LouisVanLangendonck/GraphUniverse.

GraphUniverse: Synthetic Graph Generation for Evaluating Inductive Generalization

TL;DR

GraphUniverse is introduced, a framework for generating entire families of graphs to enable the first systematic evaluation of inductive generalization at scale, and it is found that robustness to distribution shift is highly sensitive not only to model architecture choice but also to the initial graph regime.

Abstract

A fundamental challenge in graph learning is understanding how models generalize to new, unseen graphs. While synthetic benchmarks offer controlled settings for analysis, existing approaches are confined to single-graph, transductive settings where models train and test on the same graph structure. Addressing this gap, we introduce GraphUniverse, a framework for generating entire families of graphs to enable the first systematic evaluation of inductive generalization at scale. Our core innovation is the generation of graphs with persistent semantic communities, ensuring conceptual consistency while allowing fine-grained control over structural properties like homophily and degree distributions. This enables crucial but underexplored robustness tests, such as performance under controlled distribution shifts. Benchmarking a wide range of architectures -- from GNNs to graph transformers and topological architectures -- reveals that strong transductive performance is a poor predictor of inductive generalization. Furthermore, we find that robustness to distribution shift is highly sensitive not only to model architecture choice but also to the initial graph regime (e.g., high vs. low homophily). Beyond benchmarking, GraphUniverse's flexibility and scalability can facilitate the development of robust and truly generalizable architectures. The framework is open-source at https://github.com/LouisVanLangendonck/GraphUniverse.

Paper Structure

This paper contains 90 sections, 35 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Overview of GraphUniverse generation methodology.
  • Figure 2: Parameter sensitivity heatmap from 100 randomized graph families with all parameters simultaneously varied across complete ranges. Pearson correlation coefficients are shown with stars indicating significance levels. NS indicates no statistically significant correlation.
  • Figure 3: A) Inductive (graph families of 1000 graphs) versus transductive (single graphs) test accuracy on community detection across different graph properties, with each architecture individually optimized. B) Distribution shift analysis: best-performing inductive models evaluated on graph families with shifted properties from the same Universe. Plots show accuracy changes under distributional shifts, with x-axis indicating the original training domain. N/A indicates shifts beyond feasible parameter bounds.
  • Figure 4: Left: baseline accuracy on original graphs. Right: performance changes ($\triangle$) when evaluating on larger graphs (+200, +500 nodes). Triangle counting uses normalized MAE by average graph size $\overline{N}$. Out-of-memory error for NSD in largest graphs (+500).
  • Figure 5: Model ranking correlations between real datasets and equivalent synthetic datasets. Rankings computed via bootstrap analysis. GraphUniverse (blue) shows consistently higher alignment with real-world model rankings compared to GraphWorld (purple) across both raw performance and rank-based metrics. "Without Baselines" excludes DeepSet and GraphMLP to avoid overestimation.
  • ...and 7 more figures