GeoMix: Towards Geometry-Aware Data Augmentation
Wentao Zhao, Qitian Wu, Chenxiao Yang, Junchi Yan
TL;DR
We address label scarcity in graph micro-tasks by transplanting Mixup into the graph domain with geometry-aware, in-place graph edits. GeoMix explicitly connects synthetic nodes to nearby neighbors and strengthens locality through residual schemes, with an adaptive all-pair variant that learns mixing weights. Theoretical analysis shows locality preservation under realistic conditions, and extensive experiments across 12 datasets—including homophilic and heterophilic graphs, as well as OOD and non-graph image/text tasks—demonstrate state-of-the-art performance and improved generalization. The approach is lightweight, backbone-agnostic, and extends to diverse domains, offering a practical data augmentation paradigm for GNNs with limited labeled data.
Abstract
Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. By synthesizing samples through the interpolation of features and labels, Mixup effectively addresses the issue of data scarcity. However, it has rarely been explored in graph learning tasks due to the irregularity and connectivity of graph data. Specifically, in node classification tasks, Mixup presents a challenge in creating connections for synthetic data. In this paper, we propose Geometric Mixup (GeoMix), a simple and interpretable Mixup approach leveraging in-place graph editing. It effectively utilizes geometry information to interpolate features and labels with those from the nearby neighborhood, generating synthetic nodes and establishing connections for them. We conduct theoretical analysis to elucidate the rationale behind employing geometry information for node Mixup, emphasizing the significance of locality enhancement-a critical aspect of our method's design. Extensive experiments demonstrate that our lightweight Geometric Mixup achieves state-of-the-art results on a wide variety of standard datasets with limited labeled data. Furthermore, it significantly improves the generalization capability of underlying GNNs across various challenging out-of-distribution generalization tasks. Our code is available at https://github.com/WtaoZhao/geomix.
