Table of Contents
Fetching ...

Graph Community Augmentation with GMM-based Modeling in Latent Space

Shintaro Fukushima, Kenji Yamanishi

TL;DR

The paper tackles the problem of generating graphs with a new community by embedding graphs into a latent space with a variational graph autoencoder and modeling the latent distribution with a Gaussian Mixture Model. It introduces the Graph Community Augmentation framework, which adds a new GMM component under novelty and reliability constraints guided by the Minimum Description Length principle, and decodes augmented latent representations to new graphs. Empirical results on synthetic stochastic block models and real networks show that GCA achieves high novelty while preserving the core structure of the original graphs, outperforming several baselines in generating meaningful new communities. The approach offers a principled pathway for knowledge extrapolation in graphs and has potential to enhance generalization when real graph data are scarce.

Abstract

This study addresses the issue of graph generation with generative models. In particular, we are concerned with graph community augmentation problem, which refers to the problem of generating unseen or unfamiliar graphs with a new community out of the probability distribution estimated with a given graph dataset. The graph community augmentation means that the generated graphs have a new community. There is a chance of discovering an unseen but important structure of graphs with a new community, for example, in a social network such as a purchaser network. Graph community augmentation may also be helpful for generalization of data mining models in a case where it is difficult to collect real graph data enough. In fact, there are many ways to generate a new community in an existing graph. It is desirable to discover a new graph with a new community beyond the given graph while we keep the structure of the original graphs to some extent for the generated graphs to be realistic. To this end, we propose an algorithm called the graph community augmentation (GCA). The key ideas of GCA are (i) to fit Gaussian mixture model (GMM) to data points in the latent space into which the nodes in the original graph are embedded, and (ii) to add data points in the new cluster in the latent space for generating a new community based on the minimum description length (MDL) principle. We empirically demonstrate the effectiveness of GCA for generating graphs with a new community structure on synthetic and real datasets.

Graph Community Augmentation with GMM-based Modeling in Latent Space

TL;DR

The paper tackles the problem of generating graphs with a new community by embedding graphs into a latent space with a variational graph autoencoder and modeling the latent distribution with a Gaussian Mixture Model. It introduces the Graph Community Augmentation framework, which adds a new GMM component under novelty and reliability constraints guided by the Minimum Description Length principle, and decodes augmented latent representations to new graphs. Empirical results on synthetic stochastic block models and real networks show that GCA achieves high novelty while preserving the core structure of the original graphs, outperforming several baselines in generating meaningful new communities. The approach offers a principled pathway for knowledge extrapolation in graphs and has potential to enhance generalization when real graph data are scarce.

Abstract

This study addresses the issue of graph generation with generative models. In particular, we are concerned with graph community augmentation problem, which refers to the problem of generating unseen or unfamiliar graphs with a new community out of the probability distribution estimated with a given graph dataset. The graph community augmentation means that the generated graphs have a new community. There is a chance of discovering an unseen but important structure of graphs with a new community, for example, in a social network such as a purchaser network. Graph community augmentation may also be helpful for generalization of data mining models in a case where it is difficult to collect real graph data enough. In fact, there are many ways to generate a new community in an existing graph. It is desirable to discover a new graph with a new community beyond the given graph while we keep the structure of the original graphs to some extent for the generated graphs to be realistic. To this end, we propose an algorithm called the graph community augmentation (GCA). The key ideas of GCA are (i) to fit Gaussian mixture model (GMM) to data points in the latent space into which the nodes in the original graph are embedded, and (ii) to add data points in the new cluster in the latent space for generating a new community based on the minimum description length (MDL) principle. We empirically demonstrate the effectiveness of GCA for generating graphs with a new community structure on synthetic and real datasets.

Paper Structure

This paper contains 34 sections, 25 equations, 4 figures, 4 tables, 2 algorithms.

Figures (4)

  • Figure 1: Illustration of the graph community augmentation problem. "E" denotes an encoder that maps nodes in a graph into data points in a latent space, whereas "D" denotes a decoder that maps data points in a latent space into a new graph.
  • Figure 2: Overall flow of the proposed algorithm called the graph community augmentation (GCA).
  • Figure 3: An example of the data points in the latent space and a generated graph for SBM dataset. The blue and red points indicate the ones in the original graph and the generated one, respectively.
  • Figure 4: The data points in the latent space and the generated graph for CiteSeer dataset. The blue and red points indicate the ones in the original graph and the generated one, respectively.