Table of Contents
Fetching ...

PolyGraph Discrepancy: a classifier-based metric for graph generation

Markus Krimmel, Philip Hartout, Karsten Borgwardt, Dexiong Chen

TL;DR

The paper addresses the lack of absolute, cross-descriptor evaluation in graph-generative modeling by criticizing MMD-based metrics. It proposes PolyGraph Discrepancy (PGD), a classifier-based approach that approximates a variational lower bound on the Jensen-Shannon distance between real and generated graphs, yielding unit-scale scores. PGD supports single-descriptor estimation and multi-descriptor aggregation with a principled descriptor-selection step, using TabPFN as a fast discriminative model. Extensive experiments show PGD tracks perturbations, correlates with model quality, and provides robust benchmarks, accompanied by an open-source PolyGraph library for standardized evaluation.

Abstract

Existing methods for evaluating graph generative models primarily rely on Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these metrics can rank generative models, they do not provide an absolute measure of performance. Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraph Discrepancy (PGD), a new evaluation framework that addresses these limitations. It approximates the Jensen-Shannon distance of graph distributions by fitting binary classifiers to distinguish between real and generated graphs, featurized by these descriptors. The data log-likelihood of these classifiers approximates a variational lower bound on the JS distance between the two distributions. Resulting metrics are constrained to the unit interval [0,1] and are comparable across different graph descriptors. We further derive a theoretically grounded summary metric that combines these individual metrics to provide a maximally tight lower bound on the distance for the given descriptors. Thorough experiments demonstrate that PGD provides a more robust and insightful evaluation compared to MMD metrics. The PolyGraph framework for benchmarking graph generative models is made publicly available at https://github.com/BorgwardtLab/polygraph-benchmark.

PolyGraph Discrepancy: a classifier-based metric for graph generation

TL;DR

The paper addresses the lack of absolute, cross-descriptor evaluation in graph-generative modeling by criticizing MMD-based metrics. It proposes PolyGraph Discrepancy (PGD), a classifier-based approach that approximates a variational lower bound on the Jensen-Shannon distance between real and generated graphs, yielding unit-scale scores. PGD supports single-descriptor estimation and multi-descriptor aggregation with a principled descriptor-selection step, using TabPFN as a fast discriminative model. Extensive experiments show PGD tracks perturbations, correlates with model quality, and provides robust benchmarks, accompanied by an open-source PolyGraph library for standardized evaluation.

Abstract

Existing methods for evaluating graph generative models primarily rely on Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these metrics can rank generative models, they do not provide an absolute measure of performance. Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraph Discrepancy (PGD), a new evaluation framework that addresses these limitations. It approximates the Jensen-Shannon distance of graph distributions by fitting binary classifiers to distinguish between real and generated graphs, featurized by these descriptors. The data log-likelihood of these classifiers approximates a variational lower bound on the JS distance between the two distributions. Resulting metrics are constrained to the unit interval [0,1] and are comparable across different graph descriptors. We further derive a theoretically grounded summary metric that combines these individual metrics to provide a maximally tight lower bound on the distance for the given descriptors. Thorough experiments demonstrate that PGD provides a more robust and insightful evaluation compared to MMD metrics. The PolyGraph framework for benchmarking graph generative models is made publicly available at https://github.com/BorgwardtLab/polygraph-benchmark.

Paper Structure

This paper contains 44 sections, 17 equations, 27 figures, 23 tables, 1 algorithm.

Figures (27)

  • Figure 1: Computation of the pgd metric. TabPFN is trained to discriminate between generated and reference graphs based on different vectorial descriptions. The most expressive descriptor (here: orbit) is used to derive the final pgd, yielding a maximally tight lower bound on the js distance between the generated and reference graph distributions.
  • Figure 2: Examples of MMD estimates that suffer from high bias (left) and variance (right).
  • Figure 3: Spearman correlation of MMDs and pgd with magnitude of perturbation.
  • Figure 4: Trajectory of validity, pgd, and MMDs when increasing the number of denoising steps in DiGress on Planar-L.
  • Figure 5: Trajectory of validity, pgd, and MMD metrics during training of DiGress on SBM-L.
  • ...and 22 more figures