Table of Contents
Fetching ...

Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions

Mai Elkady, Thu Bui, Bruno Ribeiro, David I. Inouye

TL;DR

A novel evaluation method called Vertical Validation (VV) is designed that systematically creates thin support regions during the train-test splitting procedure and then reweights generated samples so that they can be compared to the held-out test data.

Abstract

There has been a growing excitement that implicit graph generative models could be used to design or discover new molecules for medicine or material design. Because these molecules have not been discovered, they naturally lie in unexplored or scarcely supported regions of the distribution of known molecules. However, prior evaluation methods for implicit graph generative models have focused on validating statistics computed from the thick support (e.g., mean and variance of a graph property). Therefore, there is a mismatch between the goal of generating novel graphs and the evaluation methods. To address this evaluation gap, we design a novel evaluation method called Vertical Validation (VV) that systematically creates thin support regions during the train-test splitting procedure and then reweights generated samples so that they can be compared to the held-out test data. This procedure can be seen as a generalization of the standard train-test procedure except that the splits are dependent on sample features. We demonstrate that our method can be used to perform model selection if performance on thin support regions is the desired goal. As a side benefit, we also show that our approach can better detect overfitting as exemplified by memorization.

Vertical Validation: Evaluating Implicit Generative Models for Graphs on Thin Support Regions

TL;DR

A novel evaluation method called Vertical Validation (VV) is designed that systematically creates thin support regions during the train-test splitting procedure and then reweights generated samples so that they can be compared to the held-out test data.

Abstract

There has been a growing excitement that implicit graph generative models could be used to design or discover new molecules for medicine or material design. Because these molecules have not been discovered, they naturally lie in unexplored or scarcely supported regions of the distribution of known molecules. However, prior evaluation methods for implicit graph generative models have focused on validating statistics computed from the thick support (e.g., mean and variance of a graph property). Therefore, there is a mismatch between the goal of generating novel graphs and the evaluation methods. To address this evaluation gap, we design a novel evaluation method called Vertical Validation (VV) that systematically creates thin support regions during the train-test splitting procedure and then reweights generated samples so that they can be compared to the held-out test data. This procedure can be seen as a generalization of the standard train-test procedure except that the splits are dependent on sample features. We demonstrate that our method can be used to perform model selection if performance on thin support regions is the desired goal. As a side benefit, we also show that our approach can better detect overfitting as exemplified by memorization.

Paper Structure

This paper contains 49 sections, 4 theorems, 25 equations, 21 figures, 3 tables.

Key Result

Proposition 1

For any $\epsilon < 1$ and $\psi \in \{1,2,\dots\}$ and assuming the splits are equal size in expectation, i.e., $p(S_{i,\ell})=\frac{1}{k}$, if $p(U_{i,\ell}|S_{i,\ell}) =(1\!-\!\epsilon)p_\mathrm{BetaMix}(U_{i,\ell}|S_{i,\ell}) + \epsilon p_{\mathrm{Unif}[0,1]}(U_{i,\ell})\,,$ where and where $\alpha_{j,a} \triangleq (j-1) \psi + a$ and $\beta_{j,a} \triangleq \psi k + 1 - \alpha_{j,a}$, then $

Figures (21)

  • Figure 1: VV systematically thins the distribution in a certain region for training (top row) and then evaluates whether the generated samples in the thinned region after reweighting matches the complementary held-out test dataset (bottom row). In contrast, standard evaluations will seek to match the macro properties (e.g., mean) of this distribution which emphasizes the regions of thick support. The original data (left) illustrates both thick support regions (i.e., areas with many samples) and thin support regions (i.e., areas with very few samples).
  • Figure 2: (a) The VV splitting process has 5 steps: 1) compute the relevant graph properties for each graph, 2) project samples via the CDF to a uniform distribution 3) Define the split distributions via a mixture of Beta distributions and a unifrom distribution, 4) Compute the split probability conditioned on $U_\ell$ using Bayes rule, and finally 5) sample the split variable based on these conditional probabilities. This will result in different splits. In the histograms above we plot the distribution of the split property $\ell$ in both the train and held parts for different splits. (b) An illustration of the reweighting process performed by VV for one of the splits (for $j = 1$ and $\ell = 1$ where the total number of properties is $m = 2$ ).
  • Figure 3: The Figure showcases stacked conditional split probabilities $p(S_{\ell}|U_{\ell})$ obtained using VV with different sharpness $\psi$ and uniform mixing parameter $\epsilon$. In (a), the split distributions may overlap too much when $\psi=1$. In (b), the splits are sharper but still smooth. In (c), the uniform mixing parameter allows all splits to have some support. In (d) and (e), we demonstrate the extremes of our approach that yield either quantile splits as $\psi \to \infty$ or uniform splits as in standard CV if $\epsilon = 1$.
  • Figure 4: On both Erdos-Renyi and Comm20 datasets, our proposed vertical validation approach (VV) can select the best model for generating in thin support as shown by the test line, whereas the standard train-test splitting (CV) tends to favor memorization despite poor generalization to the thin support regions. Note that for CV, the oracle distribution seems worse than memorization because oracle is generating from the true distribution rather than the shifted training distribution---thus it appears to CV that memorizing is actually a better option. This phenomena does not happen in our validation approach because we aim to find a model that generalizes to the thin support well. This also showcases that our approach is better able to detect memorization than the standard train-test split validation.
  • Figure 5: Example of the generated molecules from DiGress. These are the top 4 -after filtering for validity and novelty- according to the weights assigned by our method when using v-test as the held out portion.
  • ...and 16 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Theorem 1: $\phi_{\text{KS}}(\Xheldw, \cG_{\textnormal{gen},W}^{(\ell,j)}; h_{\ell'})$ consistent
  • Proposition 1
  • proof
  • Theorem 1: $\phi_{\text{KS}}(\Xheldw, \cG_{\textnormal{gen},W}^{(\ell,j)}; h_{\ell'})$ consistent
  • proof