Table of Contents
Fetching ...

Mitigating Degree Bias in Graph Representation Learning with Learnable Structural Augmentation and Structural Self-Attention

Van Thuy Hoang, Hyeon-Ju Jeon, O-Joun Lee

TL;DR

DegFairGT addresses degree bias in graph representation learning by introducing learnable structural augmentation that preferentially connects non-adjacent nodes with high structural similarity within communities. It couples this augmentation with a structural self-attention mechanism that explicitly encodes high-order proximity and degree-based similarity, and it uses self-supervised tasks to preserve global graph structure and node features during pre-training. Across six datasets, DegFairGT demonstrates improved degree fairness ($\Delta_{SP}$, $\Delta_{EO}$), competitive node classification accuracy, and better graph-structure preservation, outperforming both augmentation-based and graph-transformer baselines. The work highlights the importance of combining proximity-aware context sampling with a similarity-aware attention mechanism to mitigate degree bias while maintaining global graph topology, though it acknowledges the quadratic complexity of self-attention and points to linear-transformer approaches as future work.

Abstract

Graph Neural Networks (GNNs) update node representations through message passing, which is primarily based on the homophily principle, assuming that adjacent nodes share similar features. However, in real-world graphs with long-tailed degree distributions, high-degree nodes dominate message passing, causing a degree bias where low-degree nodes remain under-represented due to inadequate messages. The main challenge in addressing degree bias is how to discover non-adjacent nodes to provide additional messages to low-degree nodes while reducing excessive messages for high-degree nodes. Nevertheless, exploiting non-adjacent nodes to provide valuable messages is challenging, as it could generate noisy information and disrupt the original graph structures. To solve it, we propose a novel Degree Fairness Graph Transformer, named DegFairGT, to mitigate degree bias by discovering structural similarities between non-adjacent nodes through learnable structural augmentation and structural self-attention. Our key idea is to exploit non-adjacent nodes with similar roles in the same community to generate informative edges under our augmentation, which could provide informative messages between nodes with similar roles while ensuring that the homophily principle is maintained within the community. To enable DegFairGT to learn such structural similarities, we then propose a structural self-attention to capture the similarities between node pairs. To preserve global graph structures and prevent graph augmentation from hindering graph structure, we propose a Self-Supervised Learning task to preserve p-step transition probability and regularize graph augmentation. Extensive experiments on six datasets showed that DegFairGT outperformed state-of-the-art baselines in degree fairness analysis, node classification, and node clustering tasks.

Mitigating Degree Bias in Graph Representation Learning with Learnable Structural Augmentation and Structural Self-Attention

TL;DR

DegFairGT addresses degree bias in graph representation learning by introducing learnable structural augmentation that preferentially connects non-adjacent nodes with high structural similarity within communities. It couples this augmentation with a structural self-attention mechanism that explicitly encodes high-order proximity and degree-based similarity, and it uses self-supervised tasks to preserve global graph structure and node features during pre-training. Across six datasets, DegFairGT demonstrates improved degree fairness (, ), competitive node classification accuracy, and better graph-structure preservation, outperforming both augmentation-based and graph-transformer baselines. The work highlights the importance of combining proximity-aware context sampling with a similarity-aware attention mechanism to mitigate degree bias while maintaining global graph topology, though it acknowledges the quadratic complexity of self-attention and points to linear-transformer approaches as future work.

Abstract

Graph Neural Networks (GNNs) update node representations through message passing, which is primarily based on the homophily principle, assuming that adjacent nodes share similar features. However, in real-world graphs with long-tailed degree distributions, high-degree nodes dominate message passing, causing a degree bias where low-degree nodes remain under-represented due to inadequate messages. The main challenge in addressing degree bias is how to discover non-adjacent nodes to provide additional messages to low-degree nodes while reducing excessive messages for high-degree nodes. Nevertheless, exploiting non-adjacent nodes to provide valuable messages is challenging, as it could generate noisy information and disrupt the original graph structures. To solve it, we propose a novel Degree Fairness Graph Transformer, named DegFairGT, to mitigate degree bias by discovering structural similarities between non-adjacent nodes through learnable structural augmentation and structural self-attention. Our key idea is to exploit non-adjacent nodes with similar roles in the same community to generate informative edges under our augmentation, which could provide informative messages between nodes with similar roles while ensuring that the homophily principle is maintained within the community. To enable DegFairGT to learn such structural similarities, we then propose a structural self-attention to capture the similarities between node pairs. To preserve global graph structures and prevent graph augmentation from hindering graph structure, we propose a Self-Supervised Learning task to preserve p-step transition probability and regularize graph augmentation. Extensive experiments on six datasets showed that DegFairGT outperformed state-of-the-art baselines in degree fairness analysis, node classification, and node clustering tasks.

Paper Structure

This paper contains 29 sections, 13 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Low-degree nodes are more misclassified than other nodes in the GT model dwivedi2020generalization on Photo and Computers datasets. The miss-classification rate is higher for low-degree nodes compared to high-degree nodes.
  • Figure 2: Structural graph augmentation adds intra-community edges between low-degree nodes and removes edges between nodes with different degrees or communities. Given an input graph (left), our structural graph augmentation adds intra-community edges between low-degree nodes and removes edges between nodes with different degrees or communities. Our structural augmentation can enable each node to obtain more valuable messages from neighbors within the community and k-hop distance through message passing (right).
  • Figure 3: The overall architecture of DegFairGT. DegFairGT comprises two main blocks: structural graph augmentation and structural self-attention. The augmentation module takes the original adjacency matrix $A$ and the degree-weighted score $D$ as inputs and then samples edges with probabilities defined in $\tilde{A}$ to generate a new graph $G'$. After the augmentation, the new graph $G'$ is fed into the graph transformers networks, which then learn the representations $Z$. The self-attention module at layer $l$-th receives a node feature of the target node $h^{l}_i$, the features of neighbouring nodes $\{h^{l}_j\}$, the proximity $s^{l}_ij$, and the structural similarity $f^{l}_d(D_{ij})$ between nodes $v_i$ and $v_j$ as inputs. Finally, the learned representation $Z$ could be used for various downstream tasks, e.g., node classification.
  • Figure 4: For a target node $v_i$, our augmentation samples context nodes in the same community and ranks the context nodes based on their node degrees within $k$-hops. Two nodes $v_k$ and $v_j$ have a high correlation with the target node $v_i$ as they have similar low degrees, respectively.
  • Figure 5: An analysis on degree-weighted matrix $D$. When two nodes $v_i$ and $v_j$ have low degrees together, they are more frequently sampled to generate valuable edges.
  • ...and 2 more figures