Table of Contents
Fetching ...

On the Equivalence of Graph Convolution and Mixup

Xiaotian Han, Hanqing Zeng, Yu Chen, Shaoliang Nie, Jingzhou Liu, Kanika Narang, Zahra Shakeri, Karthik Abinav Sankararaman, Song Jiang, Madian Khabsa, Qifan Wang, Xia Hu

TL;DR

The paper addresses whether graph convolution can be understood as a form of Mixup and shows that GCN/SGC are Mixup operators under two mild modifications: Homophily Relabel and Test-Time Mixup. It provides a mathematical bridge by expressing 1- and 2-layer GCNs as Mixup (input and manifold) and demonstrates that SGC is a Mixup variant as well, contingent on the two modifications. The authors propose two practical MLP-based models, HMLP and TMLP, that replicate GNN performance: HMLP relabels targets during training and trains an MLP on features alone, while TMLP trains on features and applies neighbor aggregation only at test time; a unified version combining both achieves comparable results to GNNs. This work offers a new interpretive framework for GNNs, suggests efficient alternatives for large graphs, and points to broader applications of Mixup in graph learning.

Abstract

This paper investigates the relationship between graph convolution and Mixup techniques. Graph convolution in a graph neural network involves aggregating features from neighboring samples to learn representative features for a specific node or sample. On the other hand, Mixup is a data augmentation technique that generates new examples by averaging features and one-hot labels from multiple samples. One commonality between these techniques is their utilization of information from multiple samples to derive feature representation. This study aims to explore whether a connection exists between these two approaches. Our investigation reveals that, under two mild conditions, graph convolution can be viewed as a specialized form of Mixup that is applied during both the training and testing phases. The two conditions are: 1) \textit{Homophily Relabel} - assigning the target node's label to all its neighbors, and 2) \textit{Test-Time Mixup} - Mixup the feature during the test time. We establish this equivalence mathematically by demonstrating that graph convolution networks (GCN) and simplified graph convolution (SGC) can be expressed as a form of Mixup. We also empirically verify the equivalence by training an MLP using the two conditions to achieve comparable performance.

On the Equivalence of Graph Convolution and Mixup

TL;DR

The paper addresses whether graph convolution can be understood as a form of Mixup and shows that GCN/SGC are Mixup operators under two mild modifications: Homophily Relabel and Test-Time Mixup. It provides a mathematical bridge by expressing 1- and 2-layer GCNs as Mixup (input and manifold) and demonstrates that SGC is a Mixup variant as well, contingent on the two modifications. The authors propose two practical MLP-based models, HMLP and TMLP, that replicate GNN performance: HMLP relabels targets during training and trains an MLP on features alone, while TMLP trains on features and applies neighbor aggregation only at test time; a unified version combining both achieves comparable results to GNNs. This work offers a new interpretive framework for GNNs, suggests efficient alternatives for large graphs, and points to broader applications of Mixup in graph learning.

Abstract

This paper investigates the relationship between graph convolution and Mixup techniques. Graph convolution in a graph neural network involves aggregating features from neighboring samples to learn representative features for a specific node or sample. On the other hand, Mixup is a data augmentation technique that generates new examples by averaging features and one-hot labels from multiple samples. One commonality between these techniques is their utilization of information from multiple samples to derive feature representation. This study aims to explore whether a connection exists between these two approaches. Our investigation reveals that, under two mild conditions, graph convolution can be viewed as a specialized form of Mixup that is applied during both the training and testing phases. The two conditions are: 1) \textit{Homophily Relabel} - assigning the target node's label to all its neighbors, and 2) \textit{Test-Time Mixup} - Mixup the feature during the test time. We establish this equivalence mathematically by demonstrating that graph convolution networks (GCN) and simplified graph convolution (SGC) can be expressed as a form of Mixup. We also empirically verify the equivalence by training an MLP using the two conditions to achieve comparable performance.
Paper Structure (36 sections, 10 equations, 12 figures, 7 tables)

This paper contains 36 sections, 10 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Graph convolution is Mixup. (a) illustrates the basic idea of Mixup: averaging the features and one-hot labels of multiple samples ($\vcenter{}$, $\vcenter{}, \vcenter{}$). (b) shows the graph convolution operation where the feature of the target node ($\vcenter{}$) is the weighted average of the features of all its neighbors. (b) $\rightarrow$ (c) shows that graph convolution is Mixup if we assign the label ($\vcenter{}$) of the target node ($\vcenter{}$) to all of its neighbors ($\vcenter{}, \vcenter{}$). (d) shows that Mixup is empirically equivalent to GCN.
  • Figure 2: The example graphs. $\mathbf{x}_i$ is the target node, The loss of $\mathbf{x}_i$ that connected two nodes with different labels.
  • Figure 3: The training and test curves of GCN and HMLP.
  • Figure 4: The performance comparison of the GCN, MLP and HMLP (Ours). The x-axis represents the ratio of training data, and the y-axis represents the accuracy of classification. The results show that our proposed method (HMLP) achieves comparable performance to GCN. Note that the architecture of our method in train and test time are both MLP. More experimental results on other datasets and GNNs are presented in \ref{['sec:app:exp:hmlp']}.
  • Figure 5: Visualization of node representations learned by MLP and TMLP. Node representations from the same class, after Test-Time Mixup, become clustered together.
  • ...and 7 more figures