Table of Contents
Fetching ...

Graph data augmentation with Gromow-Wasserstein Barycenters

Andrea Ponti

TL;DR

The paper addresses the challenge of augmenting graph data for classification in non-Euclidean spaces by modeling graphs as graphons and estimating them via Gromov-Wasserstein (GW) barycenters. It formalizes a GW-based barycenter framework to obtain class-specific graphons and generate synthetic graphs for augmentation, demonstrating improved classification performance on benchmarks even with small augmentation fractions. The approach provides a principled, non-Euclidean mechanism to validate graphon estimators and offers a scalable alternative to diffusion-based generative models. Overall, GW-barycenter graphon estimation enhances data efficiency and generalization in graph classification settings.

Abstract

Graphs are ubiquitous in various fields, and deep learning methods have been successful applied in graph classification tasks. However, building large and diverse graph datasets for training can be expensive. While augmentation techniques exist for structured data like images or numerical data, the augmentation of graph data remains challenging. This is primarily due to the complex and non-Euclidean nature of graph data. In this paper, it has been proposed a novel augmentation strategy for graphs that operates in a non-Euclidean space. This approach leverages graphon estimation, which models the generative mechanism of networks sequences. Computational results demonstrate the effectiveness of the proposed augmentation framework in improving the performance of graph classification models. Additionally, using a non-Euclidean distance, specifically the Gromow-Wasserstein distance, results in better approximations of the graphon. This framework also provides a means to validate different graphon estimation approaches, particularly in real-world scenarios where the true graphon is unknown.

Graph data augmentation with Gromow-Wasserstein Barycenters

TL;DR

The paper addresses the challenge of augmenting graph data for classification in non-Euclidean spaces by modeling graphs as graphons and estimating them via Gromov-Wasserstein (GW) barycenters. It formalizes a GW-based barycenter framework to obtain class-specific graphons and generate synthetic graphs for augmentation, demonstrating improved classification performance on benchmarks even with small augmentation fractions. The approach provides a principled, non-Euclidean mechanism to validate graphon estimators and offers a scalable alternative to diffusion-based generative models. Overall, GW-barycenter graphon estimation enhances data efficiency and generalization in graph classification settings.

Abstract

Graphs are ubiquitous in various fields, and deep learning methods have been successful applied in graph classification tasks. However, building large and diverse graph datasets for training can be expensive. While augmentation techniques exist for structured data like images or numerical data, the augmentation of graph data remains challenging. This is primarily due to the complex and non-Euclidean nature of graph data. In this paper, it has been proposed a novel augmentation strategy for graphs that operates in a non-Euclidean space. This approach leverages graphon estimation, which models the generative mechanism of networks sequences. Computational results demonstrate the effectiveness of the proposed augmentation framework in improving the performance of graph classification models. Additionally, using a non-Euclidean distance, specifically the Gromow-Wasserstein distance, results in better approximations of the graphon. This framework also provides a means to validate different graphon estimation approaches, particularly in real-world scenarios where the true graphon is unknown.
Paper Structure (5 sections, 7 equations, 1 figure, 1 table)

This paper contains 5 sections, 7 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Visual representation of the graphons estimated using different methods on the three datasets. Lighter colors means higher probability of connection.