Table of Contents
Fetching ...

TSC: A Simple Two-Sided Constraint against Over-Smoothing

Furong Peng, Kang Liu, Xuan Lu, Yuhua Qian, Hongren Yan, Chao Ma

TL;DR

This work analyzes over-smoothing in deep graph convolutional networks as a consequence of high-order neighbor overlap and excessive neighbor counts. It introduces Two-Sided Constraint (TSC), a plug-in module combining Random Mask (column-wise) and Contrastive Constraint (row-wise) to simultaneously curb convergence and preserve node discriminability in GCN and SGC. Empirical results across five real-world datasets demonstrate that TSC reduces representation convergence and yields competitive or state-of-the-art accuracy at deeper layers, with theoretical analysis and ablations validating the contributions of both components. The practical impact is a simple, effective approach to enable deeper graph models without sacrificing performance or interpretability, supported by visualization and ablation evidence.

Abstract

Graph Convolutional Neural Network (GCN), a widely adopted method for analyzing relational data, enhances node discriminability through the aggregation of neighboring information. Usually, stacking multiple layers can improve the performance of GCN by leveraging information from high-order neighbors. However, the increase of the network depth will induce the over-smoothing problem, which can be attributed to the quality and quantity of neighbors changing: (a) neighbor quality, node's neighbors become overlapping in high order, leading to aggregated information becoming indistinguishable, (b) neighbor quantity, the exponentially growing aggregated neighbors submerges the node's initial feature by recursively aggregating operations. Current solutions mainly focus on one of the above causes and seldom consider both at once. Aiming at tackling both causes of over-smoothing in one shot, we introduce a simple Two-Sided Constraint (TSC) for GCNs, comprising two straightforward yet potent techniques: random masking and contrastive constraint. The random masking acts on the representation matrix's columns to regulate the degree of information aggregation from neighbors, thus preventing the convergence of node representations. Meanwhile, the contrastive constraint, applied to the representation matrix's rows, enhances the discriminability of the nodes. Designed as a plug-in module, TSC can be easily coupled with GCN or SGC architectures. Experimental analyses on diverse real-world graph datasets verify that our approach markedly reduces the convergence of node's representation and the performance degradation in deeper GCN.

TSC: A Simple Two-Sided Constraint against Over-Smoothing

TL;DR

This work analyzes over-smoothing in deep graph convolutional networks as a consequence of high-order neighbor overlap and excessive neighbor counts. It introduces Two-Sided Constraint (TSC), a plug-in module combining Random Mask (column-wise) and Contrastive Constraint (row-wise) to simultaneously curb convergence and preserve node discriminability in GCN and SGC. Empirical results across five real-world datasets demonstrate that TSC reduces representation convergence and yields competitive or state-of-the-art accuracy at deeper layers, with theoretical analysis and ablations validating the contributions of both components. The practical impact is a simple, effective approach to enable deeper graph models without sacrificing performance or interpretability, supported by visualization and ablation evidence.

Abstract

Graph Convolutional Neural Network (GCN), a widely adopted method for analyzing relational data, enhances node discriminability through the aggregation of neighboring information. Usually, stacking multiple layers can improve the performance of GCN by leveraging information from high-order neighbors. However, the increase of the network depth will induce the over-smoothing problem, which can be attributed to the quality and quantity of neighbors changing: (a) neighbor quality, node's neighbors become overlapping in high order, leading to aggregated information becoming indistinguishable, (b) neighbor quantity, the exponentially growing aggregated neighbors submerges the node's initial feature by recursively aggregating operations. Current solutions mainly focus on one of the above causes and seldom consider both at once. Aiming at tackling both causes of over-smoothing in one shot, we introduce a simple Two-Sided Constraint (TSC) for GCNs, comprising two straightforward yet potent techniques: random masking and contrastive constraint. The random masking acts on the representation matrix's columns to regulate the degree of information aggregation from neighbors, thus preventing the convergence of node representations. Meanwhile, the contrastive constraint, applied to the representation matrix's rows, enhances the discriminability of the nodes. Designed as a plug-in module, TSC can be easily coupled with GCN or SGC architectures. Experimental analyses on diverse real-world graph datasets verify that our approach markedly reduces the convergence of node's representation and the performance degradation in deeper GCN.
Paper Structure (29 sections, 26 equations, 9 figures, 13 tables)

This paper contains 29 sections, 26 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: Visualization of node representations in different layers. The 1st, 2nd, and 3rd columns depict the visualizations of node representations at the 1st, 8th, and 32nd layers, respectively. Row (a) shows the node distribution for vanilla GCN, (b) for DropMessage, (c) for ContraNorm, and (d) for our proposed TSC applied to GCN. Node colors indicate the categories. All sub-figures are visualized using t-SNE.
  • Figure 2: The neighbors changes across different orders. In the 3-order neighborhood, the node A and B have overlapped neighbors and have large number of neighbors.
  • Figure 3: The overview of TSC applied to SGC and GCN. (a) illustrates the structure of TSC on SGC, while (b) depicts its application to GCN. They both add random masking to the columns of the representation matrix to mitigate representation convergence, and add contrastive constraints to the rows of representation matrix to enhance node's individuality.
  • Figure 4: The Random Masking.
  • Figure 5: The ablation study. The 1st row depicts the ablation results on ACC metric and the 2nd row is the result on MAD.
  • ...and 4 more figures