Table of Contents
Fetching ...

Cross-attentive Cohesive Subgraph Embedding to Mitigate Oversquashing in GNNs

Tanvir Hossain, Muhammad Ifte Khairul Islam, Lilia Chebbah, Charles Fanning, Esra Akbas

Abstract

Graph neural networks (GNNs) have achieved strong performance across various real-world domains. Nevertheless, they suffer from oversquashing, where long-range information is distorted as it is compressed through limited message-passing pathways. This bottleneck limits their ability to capture essential global context and decreases their performance, particularly in dense and heterophilic regions of graphs. To address this issue, we propose a novel graph learning framework that enriches node embeddings via cross-attentive cohesive subgraph representations to mitigate the impact of excessive long-range dependencies. This framework enhances the node representation by emphasizing cohesive structure in long-range information but removing noisy or irrelevant connections. It preserves essential global context without overloading the narrow bottlenecked channels, which further mitigates oversquashing. Extensive experiments on multiple benchmark datasets demonstrate that our model achieves consistent improvements in classification accuracy over standard baseline methods.

Cross-attentive Cohesive Subgraph Embedding to Mitigate Oversquashing in GNNs

Abstract

Graph neural networks (GNNs) have achieved strong performance across various real-world domains. Nevertheless, they suffer from oversquashing, where long-range information is distorted as it is compressed through limited message-passing pathways. This bottleneck limits their ability to capture essential global context and decreases their performance, particularly in dense and heterophilic regions of graphs. To address this issue, we propose a novel graph learning framework that enriches node embeddings via cross-attentive cohesive subgraph representations to mitigate the impact of excessive long-range dependencies. This framework enhances the node representation by emphasizing cohesive structure in long-range information but removing noisy or irrelevant connections. It preserves essential global context without overloading the narrow bottlenecked channels, which further mitigates oversquashing. Extensive experiments on multiple benchmark datasets demonstrate that our model achieves consistent improvements in classification accuracy over standard baseline methods.

Paper Structure

This paper contains 30 sections, 2 theorems, 16 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

theorem 1

($CaEF:$ Closure-aware Edge Filtration): Let $G_{k}$ denote the $k$-core of $G = (V, E)$. For an edge $e =(u, v) \in G_{k}$, where $N(u)$ and $N(v)$ are the neighbor set of node $u$ and $v$, its triadic support is defined as For $k \ge \delta$, if $S(u, v) = 0$ then $(u, v)$ is removed from $G_{k}$ and reassigned to previous core $C (u, v) = (k-1)$, where $\delta$ denotes the edge filtering thres

Figures (6)

  • Figure 1: Pilot Study. Average number $(\#)$ of paths (ANP) per node for $n \in \{4,5\}$ hop distances (HD) in Cora and Chameleon (Chm) datasets. Blue bars denote the original graph ($G$), its cores ($k$) and their (PS)-pooled subgraphs ($P-k$); green bars present their homophilic counterparts ($H-G$, $H-k$ and $PH-k$). From right to left, the deep (blue & green) bar pairs (Figs. \ref{['fig:cora-org-HD']} and \ref{['fig:cham-org-HD']}) with hatch $('*')$ present the ANP of original graph and its homophilic subgraph respectively; in other cases, more deeper bar color denotes more denser subgraph. K - thousand and, M|B - m|billion.
  • Figure 2: Architecture of $\texttt{CaCoSE}~(\delta = 3)$.
  • Figure 3: Sensitivity Analysis. Varying Pooling Ratios (PR) and Numbers of Heads (NH). $\texttt{CaCoSE}$'s settings for Node Classification (PR = 50% and NH = 2) and for Graph Classification (PR = 50% and NH = 1) are highlighted in bold. For NC datasets, Cora CiteSeer Texas Chameleon. For GC datasets IMDB-B COLLAB MUTAG PROTEINS.
  • Figure 4: Snippets of Bridge Analysis. Edge $(1976,473)$ in Chameleon (Chm) and $(4799,358)$ in Squirrel (Sqr) datasets. $k_{1}$ denotes the subgraph $(S_1)$ and presents the Bridge Edges.
  • Figure 5: Performance gain (in %)$-\texttt{CaCoSE}$ vs other Decompositions: Illustrate on six datasets (4 NC and 2 GC). Louvain $(Lv)$, Metis $(M)$, Hierarchical $(Hi)$, and Random-Walk $(Rw)$. Except for Louvain, the other methods are annotated with the number of partitions, i.e, $(M-4)$ presents Metis with $4$-partitions. Green and red shades present the positive and negative gains respectively.
  • ...and 1 more figures

Theorems & Definitions (5)

  • definition 1
  • definition 2
  • theorem 1
  • theorem 2
  • proof