Table of Contents
Fetching ...

Oversmoothing, Oversquashing, Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning

Adrian Arnaiz-Rodriguez, Federico Errica

TL;DR

It is argued that the fast pace of progress around the topics of oversmoothing and oversquashing, the homophily-heterophily dichotomy, and long-range tasks, came with the consolidation of commonly accepted beliefs and assumptions -- under the form of universal statements -- that are not always true nor easy to distinguish from each other.

Abstract

After a renaissance phase in which researchers revisited the message-passing paradigm through the lens of deep learning, the graph machine learning community shifted its attention towards a deeper and practical understanding of message-passing's benefits and limitations. In this paper, we notice how the fast pace of progress around the topics of oversmoothing and oversquashing, the homophily-heterophily dichotomy, and long-range tasks, came with the consolidation of commonly accepted beliefs and assumptions -- under the form of universal statements -- that are not always true nor easy to distinguish from each other. We argue that this has led to ambiguities around the investigated problems, preventing researchers from focusing on and addressing precise research questions while causing a good amount of misunderstandings. Our contribution is to make such common beliefs explicit and encourage critical thinking around these topics, refuting universal statements via simple yet formally sufficient counterexamples. The end goal is to clarify conceptual differences, helping researchers address more clearly defined and targeted problems.

Oversmoothing, Oversquashing, Heterophily, Long-Range, and more: Demystifying Common Beliefs in Graph Machine Learning

TL;DR

It is argued that the fast pace of progress around the topics of oversmoothing and oversquashing, the homophily-heterophily dichotomy, and long-range tasks, came with the consolidation of commonly accepted beliefs and assumptions -- under the form of universal statements -- that are not always true nor easy to distinguish from each other.

Abstract

After a renaissance phase in which researchers revisited the message-passing paradigm through the lens of deep learning, the graph machine learning community shifted its attention towards a deeper and practical understanding of message-passing's benefits and limitations. In this paper, we notice how the fast pace of progress around the topics of oversmoothing and oversquashing, the homophily-heterophily dichotomy, and long-range tasks, came with the consolidation of commonly accepted beliefs and assumptions -- under the form of universal statements -- that are not always true nor easy to distinguish from each other. We argue that this has led to ambiguities around the investigated problems, preventing researchers from focusing on and addressing precise research questions while causing a good amount of misunderstandings. Our contribution is to make such common beliefs explicit and encourage critical thinking around these topics, refuting universal statements via simple yet formally sufficient counterexamples. The end goal is to clarify conceptual differences, helping researchers address more clearly defined and targeted problems.

Paper Structure

This paper contains 34 sections, 26 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: (a-b): We depict the evolution, with increasing number of layers, of the $\mathrm{DE}$, using $W$ and $2W$ feature transformations for different architectures. (c-d): Evolution of the $\mathrm{RQ}$ for $W$ and $2W$ as before. Experiments run on the Cora dataset for 50 random seeds. A larger version for better visualization is available in Fig. \ref{['appx:fig:de-not-always-happen']}.
  • Figure 2: Left: a fully heterophilic graph inspired by ma_homophily_2022 where a 1-layer, sum-based DGN can perfectly classify the nodes due to a difference in the node degree. Right: a highly homophilic graph where the task is to predict if a node is at a distance greater than five from a specific node. Here, the performances of a DGN will be poor unless information from nodes of another community -- from the perspective of a class-0 node -- is captured.
  • Figure 3: We intuitively visualize what happens when we repeatedly aggregate the neighborhood of the star node using the message-passing paradigm, where we define the computational bottleneck as the size of the computational tree (24 computational nodes) for $k=3$vs the receptive field for $k=3$ that includes 9 three-hop neighbors).
  • Figure 4: Left: in a grid graph, the computational bottleneck grows very quickly, but there is no topological bottleneck. Middle: A visualization of the computational graph rooted at node $v_3 \in \mathcal{V}_g$ for two message passing layers, highlighting how pruning messages reduces the computational bottleneck. Right: In this graph, there is a topological bottleneck and a mild computational bottleneck. As with the grid graph (Appendix \ref{['sec:sensitivity-grid']}), the sensitivity decreases with the number of message-passing layers.
  • Figure 5: Hubs can exhibit computational bottlenecks.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 4.1: Computational Bottleneck