Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching
Federico Errica, Henrik Christiansen, Viktor Zaverkin, Takashi Maruyama, Mathias Niepert, Francesco Alesiani
TL;DR
Adaptive Message Passing (AMP) introduces a variational framework that enables graph networks to learn both the depth of message passing and which messages to filter, addressing long-range interaction challenges. By modeling depth with a learned distribution over layers $L$ and applying differentiable, soft message filtering via $oldsymbol{F}$, AMP can form effectively deep yet selective networks, truncated at $oldsymbol{\hat{L}}$ for tractable inference. The approach yields theoretical insights on mitigating oversmoothing, oversquashing, and underreaching, and empirically improves performance on five long-range datasets spanning synthetic and chemical domains. This framework avoids costly graph rewiring or exhaustive depth searches, offering a principled, scalable route to robust long-range graph reasoning with per-layer readouts and depth-aware predictions.
Abstract
Long-range interactions are essential for the correct description of complex systems in many scientific fields. The price to pay for including them in the calculations, however, is a dramatic increase in the overall computational costs. Recently, deep graph networks have been employed as efficient, data-driven models for predicting properties of complex systems represented as graphs. These models rely on a message passing strategy that should, in principle, capture long-range information without explicitly modeling the corresponding interactions. In practice, most deep graph networks cannot really model long-range dependencies due to the intrinsic limitations of (synchronous) message passing, namely oversmoothing, oversquashing, and underreaching. This work proposes a general framework that learns to mitigate these limitations: within a variational inference framework, we endow message passing architectures with the ability to adapt their depth and filter messages along the way. With theoretical and empirical arguments, we show that this strategy better captures long-range interactions, by competing with the state of the art on five node and graph prediction datasets.
