Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

Huidong Liang; Haitz Sáez de Ocáriz Borde; Baskaran Sripathmanathan; Michael Bronstein; Xiaowen Dong

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

Huidong Liang, Haitz Sáez de Ocáriz Borde, Baskaran Sripathmanathan, Michael Bronstein, Xiaowen Dong

TL;DR

This work addresses the challenge of quantifying long-range dependencies in graph learning by introducing City-Networks, four large real-world road-network datasets with diameters up to $400$ and up to $5\times 10^5$ nodes, where labels are based on a controllable $k$-hop local eccentricity measure $\hat{\varepsilon}_k(v)$. It provides a model-agnostic Jacobian-based measurement of long-range influence, using per-hop influence $I(v,u)$, per-hop totals $T_h(v)$, and an influence-weighted receptive field $R$ to demonstrate that distant hops carry substantial information on City-Networks, more so than on traditional benchmarks. The paper also offers theoretical justifications linking the dataset topology to over-smoothing via the spectral properties of the normalized adjacency operator, showing larger diameter and sparser graphs slow smoothing and enable long-range signals to persist. Together, these contributions yield a principled framework for benchmarking and understanding long-range interactions in GNNs, with practical impact for urban analytics and the design of scalable architectures.

Abstract

Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks) without a direct measurement of long-range dependency. In this work, we introduce City-Networks, a novel large-scale transductive learning dataset derived from real-world city road networks. This dataset features graphs with over 100k nodes and significantly larger diameters than those in existing benchmarks, naturally embodying long-range information. We annotate the graphs based on local node eccentricities, ensuring that the classification task inherently requires information from distant nodes. Furthermore, we propose a model-agnostic measurement based on the Jacobians of neighbors from distant hops, offering a principled quantification of long-range dependencies. Finally, we provide theoretical justifications for both our dataset design and the proposed measurement-particularly by focusing on over-smoothing and influence score dilution-which establishes a robust foundation for further exploration of long-range interactions in graph neural networks.

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

TL;DR

Abstract

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (27)