Table of Contents
Fetching ...

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

Huidong Liang, Haitz Sáez de Ocáriz Borde, Baskaran Sripathmanathan, Michael Bronstein, Xiaowen Dong

TL;DR

This work addresses the challenge of quantifying long-range dependencies in graph learning by introducing City-Networks, four large real-world road-network datasets with diameters up to $400$ and up to $5\times 10^5$ nodes, where labels are based on a controllable $k$-hop local eccentricity measure $\hat{\varepsilon}_k(v)$. It provides a model-agnostic Jacobian-based measurement of long-range influence, using per-hop influence $I(v,u)$, per-hop totals $T_h(v)$, and an influence-weighted receptive field $R$ to demonstrate that distant hops carry substantial information on City-Networks, more so than on traditional benchmarks. The paper also offers theoretical justifications linking the dataset topology to over-smoothing via the spectral properties of the normalized adjacency operator, showing larger diameter and sparser graphs slow smoothing and enable long-range signals to persist. Together, these contributions yield a principled framework for benchmarking and understanding long-range interactions in GNNs, with practical impact for urban analytics and the design of scalable architectures.

Abstract

Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks) without a direct measurement of long-range dependency. In this work, we introduce City-Networks, a novel large-scale transductive learning dataset derived from real-world city road networks. This dataset features graphs with over 100k nodes and significantly larger diameters than those in existing benchmarks, naturally embodying long-range information. We annotate the graphs based on local node eccentricities, ensuring that the classification task inherently requires information from distant nodes. Furthermore, we propose a model-agnostic measurement based on the Jacobians of neighbors from distant hops, offering a principled quantification of long-range dependencies. Finally, we provide theoretical justifications for both our dataset design and the proposed measurement-particularly by focusing on over-smoothing and influence score dilution-which establishes a robust foundation for further exploration of long-range interactions in graph neural networks.

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

TL;DR

This work addresses the challenge of quantifying long-range dependencies in graph learning by introducing City-Networks, four large real-world road-network datasets with diameters up to and up to nodes, where labels are based on a controllable -hop local eccentricity measure . It provides a model-agnostic Jacobian-based measurement of long-range influence, using per-hop influence , per-hop totals , and an influence-weighted receptive field to demonstrate that distant hops carry substantial information on City-Networks, more so than on traditional benchmarks. The paper also offers theoretical justifications linking the dataset topology to over-smoothing via the spectral properties of the normalized adjacency operator, showing larger diameter and sparser graphs slow smoothing and enable long-range signals to persist. Together, these contributions yield a principled framework for benchmarking and understanding long-range interactions in GNNs, with practical impact for urban analytics and the design of scalable architectures.

Abstract

Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks) without a direct measurement of long-range dependency. In this work, we introduce City-Networks, a novel large-scale transductive learning dataset derived from real-world city road networks. This dataset features graphs with over 100k nodes and significantly larger diameters than those in existing benchmarks, naturally embodying long-range information. We annotate the graphs based on local node eccentricities, ensuring that the classification task inherently requires information from distant nodes. Furthermore, we propose a model-agnostic measurement based on the Jacobians of neighbors from distant hops, offering a principled quantification of long-range dependencies. Finally, we provide theoretical justifications for both our dataset design and the proposed measurement-particularly by focusing on over-smoothing and influence score dilution-which establishes a robust foundation for further exploration of long-range interactions in graph neural networks.

Paper Structure

This paper contains 49 sections, 11 theorems, 55 equations, 10 figures, 5 tables.

Key Result

Proposition 5.1

Assume a connected graph $\mathcal{G}$ with more than two nodes. For all $\gamma > 0$,

Figures (10)

  • Figure 1: Visualizations of City-Networks for Paris, Shanghai, Los Angeles, and London.
  • Figure 2: Visualizations of node accessibility estimations based on local eccentricity in two sub-regions, where darker colors indicate smaller eccentricity values, i.e., nodes that are easier to access.
  • Figure 3: Baseline results across datasets at different number of layers $L = [2, 4, 8, 16]$. The results for GraphGPS are not shown on London as it is Out-of-Memory on our $48$GB GPU; the result for SGFormer on PascalVOC-SP is also not reported as it's not originally designed for inductive setting.
  • Figure 4: Normalized average total influence $\bar{T}_h / \bar{T}_0$ averaged across nodes at different distances. Note that the influence calculation for GraphGPS is Out-of-Memory on London and ogbn-arxiv.
  • Figure 5: Distribution of the 16-hop eccentricity for all nodes in each of our City-Networks.
  • ...and 5 more figures

Theorems & Definitions (27)

  • Proposition 5.1: Self-loops decrease algebraic connectivity of the original graph
  • Theorem 5.2: Bound on second largest positive eigenvalue of the normalized adjacency operator
  • Lemma 6.1: Growth of $h$-hop shells in grid-like graphs
  • Theorem 6.2: Dilution of mean aggregated influence in grid-like graphs
  • Corollary 6.3: Dilution for a planar grid-like graph
  • Corollary 6.4: Faster dilution over aggregated $h$-hop neighborhoods
  • Corollary 6.5: The dilution problem does not affect the average total influence
  • Lemma F.1: Eigenvalue complementarity of normalized operators
  • proof : Proof of Lemma \ref{['complementarity']}
  • Proposition F.2: Self-loops decrease algebraic connectivity of the original graph, from Section \ref{['sec:rate_of_oversmoothing']}
  • ...and 17 more