Table of Contents
Fetching ...

GeoMix: Towards Geometry-Aware Data Augmentation

Wentao Zhao, Qitian Wu, Chenxiao Yang, Junchi Yan

TL;DR

We address label scarcity in graph micro-tasks by transplanting Mixup into the graph domain with geometry-aware, in-place graph edits. GeoMix explicitly connects synthetic nodes to nearby neighbors and strengthens locality through residual schemes, with an adaptive all-pair variant that learns mixing weights. Theoretical analysis shows locality preservation under realistic conditions, and extensive experiments across 12 datasets—including homophilic and heterophilic graphs, as well as OOD and non-graph image/text tasks—demonstrate state-of-the-art performance and improved generalization. The approach is lightweight, backbone-agnostic, and extends to diverse domains, offering a practical data augmentation paradigm for GNNs with limited labeled data.

Abstract

Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. By synthesizing samples through the interpolation of features and labels, Mixup effectively addresses the issue of data scarcity. However, it has rarely been explored in graph learning tasks due to the irregularity and connectivity of graph data. Specifically, in node classification tasks, Mixup presents a challenge in creating connections for synthetic data. In this paper, we propose Geometric Mixup (GeoMix), a simple and interpretable Mixup approach leveraging in-place graph editing. It effectively utilizes geometry information to interpolate features and labels with those from the nearby neighborhood, generating synthetic nodes and establishing connections for them. We conduct theoretical analysis to elucidate the rationale behind employing geometry information for node Mixup, emphasizing the significance of locality enhancement-a critical aspect of our method's design. Extensive experiments demonstrate that our lightweight Geometric Mixup achieves state-of-the-art results on a wide variety of standard datasets with limited labeled data. Furthermore, it significantly improves the generalization capability of underlying GNNs across various challenging out-of-distribution generalization tasks. Our code is available at https://github.com/WtaoZhao/geomix.

GeoMix: Towards Geometry-Aware Data Augmentation

TL;DR

We address label scarcity in graph micro-tasks by transplanting Mixup into the graph domain with geometry-aware, in-place graph edits. GeoMix explicitly connects synthetic nodes to nearby neighbors and strengthens locality through residual schemes, with an adaptive all-pair variant that learns mixing weights. Theoretical analysis shows locality preservation under realistic conditions, and extensive experiments across 12 datasets—including homophilic and heterophilic graphs, as well as OOD and non-graph image/text tasks—demonstrate state-of-the-art performance and improved generalization. The approach is lightweight, backbone-agnostic, and extends to diverse domains, offering a practical data augmentation paradigm for GNNs with limited labeled data.

Abstract

Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. By synthesizing samples through the interpolation of features and labels, Mixup effectively addresses the issue of data scarcity. However, it has rarely been explored in graph learning tasks due to the irregularity and connectivity of graph data. Specifically, in node classification tasks, Mixup presents a challenge in creating connections for synthetic data. In this paper, we propose Geometric Mixup (GeoMix), a simple and interpretable Mixup approach leveraging in-place graph editing. It effectively utilizes geometry information to interpolate features and labels with those from the nearby neighborhood, generating synthetic nodes and establishing connections for them. We conduct theoretical analysis to elucidate the rationale behind employing geometry information for node Mixup, emphasizing the significance of locality enhancement-a critical aspect of our method's design. Extensive experiments demonstrate that our lightweight Geometric Mixup achieves state-of-the-art results on a wide variety of standard datasets with limited labeled data. Furthermore, it significantly improves the generalization capability of underlying GNNs across various challenging out-of-distribution generalization tasks. Our code is available at https://github.com/WtaoZhao/geomix.
Paper Structure (28 sections, 4 theorems, 33 equations, 6 figures, 5 tables)

This paper contains 28 sections, 4 theorems, 33 equations, 6 figures, 5 tables.

Key Result

Theorem 1

Consider a graph $\mathcal{G}=\{\mathcal{V}, \mathcal{E}, \{\mathcal{D}_c, c\in C \}, p, \epsilon \}$ following Assumptions (1)-(5). For any node $i \in \mathcal{V}$, the expectation of its feature after performing one Mixup operation is and for any $t>0$, the probability that the distance between the observation $\mathbf{h}_i$ and its expectation is larger than t is bounded by where $F$ is the

Figures (6)

  • Figure 1: Illustration of the training procedure with Geometric Mixup.
  • Figure 2: Mean testing accuracy and standard deviation of generalization task in Pileup Mitigation dataset with different PU conditions and physical processes. Expressions like PU10 $\rightarrow$ PU30 represent PU condition shifts. $gg\ \rightarrow \ qq$ and $qq\ \rightarrow \ gg$ indicate physical processes shifts.
  • Figure 3: The learned representations of the nodes in the Cora datasets by GCN and Geometric Mixup. Colors denote the ground-truth class labels.
  • Figure 4: The learned representations of the nodes in the CiteSeer datasets by GCN and Geometric Mixup.
  • Figure 5: Results with other underlying GNN architectures.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1: Hoeffding's Inequality