Table of Contents
Fetching ...

From Moments to Models: Graphon Mixture-Aware Mixup and Contrastive Learning

Ali Azizpour, Reza Ramezanpour, Ashutosh Sabharwal, Santiago Segarra

TL;DR

This work tackles heterogeneity in graph datasets by modeling data as a mixture of graphons and recovering the mixture via motif-moment clustering. It then leverages this graphon structure to create semantically meaningful augmentations (GMAM) and a model-aware contrastive learning framework (MGCL) that restricts negatives to graphs from different graphon models. A novel bound links small graphon cut distance to similar motif densities, underpinning the moment-based clustering. Empirically, the method achieves state-of-the-art unsupervised performance and strong supervised results, showing that accounting for underlying generative graph models yields more faithful representations and more effective data augmentation. These ideas advance graph representation learning by explicitly modeling mixture structure and aligning augmentations and contrasts with underlying generative processes.

Abstract

Real-world graph datasets often consist of mixtures of populations, where graphs are generated from multiple distinct underlying distributions. However, modern representation learning approaches, such as graph contrastive learning (GCL) and augmentation methods like Mixup, typically overlook this mixture structure. In this work, we propose a unified framework that explicitly models data as a mixture of underlying probabilistic graph generative models represented by graphons. To characterize these graphons, we leverage graph moments (motif densities) to cluster graphs arising from the same model. This enables us to disentangle the mixture components and identify their distinct generative mechanisms. This model-aware partitioning benefits two key graph learning tasks: 1) It enables a graphon-mixture-aware mixup (GMAM), a data augmentation technique that interpolates in a semantically valid space guided by the estimated graphons, instead of assuming a single graphon per class. 2) For GCL, it enables model-adaptive and principled augmentations. Additionally, by introducing a new model-aware objective, our proposed approach (termed MGCL) improves negative sampling by restricting negatives to graphs from other models. We establish a key theoretical guarantee: a novel, tighter bound showing that graphs sampled from graphons with small cut distance will have similar motif densities with high probability. Extensive experiments on benchmark datasets demonstrate strong empirical performance. In unsupervised learning, MGCL achieves state-of-the-art results, obtaining the top average rank across eight datasets. In supervised learning, GMAM consistently outperforms existing strategies, achieving new state-of-the-art accuracy in 6 out of 7 datasets.

From Moments to Models: Graphon Mixture-Aware Mixup and Contrastive Learning

TL;DR

This work tackles heterogeneity in graph datasets by modeling data as a mixture of graphons and recovering the mixture via motif-moment clustering. It then leverages this graphon structure to create semantically meaningful augmentations (GMAM) and a model-aware contrastive learning framework (MGCL) that restricts negatives to graphs from different graphon models. A novel bound links small graphon cut distance to similar motif densities, underpinning the moment-based clustering. Empirically, the method achieves state-of-the-art unsupervised performance and strong supervised results, showing that accounting for underlying generative graph models yields more faithful representations and more effective data augmentation. These ideas advance graph representation learning by explicitly modeling mixture structure and aligning augmentations and contrasts with underlying generative processes.

Abstract

Real-world graph datasets often consist of mixtures of populations, where graphs are generated from multiple distinct underlying distributions. However, modern representation learning approaches, such as graph contrastive learning (GCL) and augmentation methods like Mixup, typically overlook this mixture structure. In this work, we propose a unified framework that explicitly models data as a mixture of underlying probabilistic graph generative models represented by graphons. To characterize these graphons, we leverage graph moments (motif densities) to cluster graphs arising from the same model. This enables us to disentangle the mixture components and identify their distinct generative mechanisms. This model-aware partitioning benefits two key graph learning tasks: 1) It enables a graphon-mixture-aware mixup (GMAM), a data augmentation technique that interpolates in a semantically valid space guided by the estimated graphons, instead of assuming a single graphon per class. 2) For GCL, it enables model-adaptive and principled augmentations. Additionally, by introducing a new model-aware objective, our proposed approach (termed MGCL) improves negative sampling by restricting negatives to graphs from other models. We establish a key theoretical guarantee: a novel, tighter bound showing that graphs sampled from graphons with small cut distance will have similar motif densities with high probability. Extensive experiments on benchmark datasets demonstrate strong empirical performance. In unsupervised learning, MGCL achieves state-of-the-art results, obtaining the top average rank across eight datasets. In supervised learning, GMAM consistently outperforms existing strategies, achieving new state-of-the-art accuracy in 6 out of 7 datasets.

Paper Structure

This paper contains 55 sections, 41 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Overview of the proposed framework. (a) Graphon mixture estimation via motif moment vectors, (b) Graphon mixture–aware mixup for data augmentation, (c) Model-aware GCL leveraging graphon-informed augmentations and model-aware contrastive loss.
  • Figure 2: t-SNE embedding of graphs. Left: Varying size, $n \sim U[75,300]$. Right: Fixed size, $n=200$. Each color represents different graphon.
  • Figure 3: Comparison of the total error bounds ($2\delta_s$) for the classical (solid blue) and novel (dashed orange) approaches. The novel bound is consistently tighter, with the gap widening for motifs with more vertices ($k$), confirming its superior $O(\sqrt{k})$ scaling.
  • Figure 4: Effect of clustering on TFR across different datasets.
  • Figure 5: Effect of the number of clusters on MGCL performance.
  • ...and 2 more figures

Theorems & Definitions (2)

  • proof
  • proof