Table of Contents
Fetching ...

Zero-Shot Conditioning of Score-Based Diffusion Models by Neuro-Symbolic Constraints

Davide Scassola, Sebastiano Saccani, Ginevra Carbone, Luca Bortolussi

TL;DR

Score-based diffusion models enable conditional generation but typically rely on retraining a conditional model or classifier guidance. This work introduces a zero-shot conditioning method that reuses a pre-trained unconditional score-based model to sample from p(x|constraint) by modifying the score with differentiable soft constraints encoded via a neuro-symbolic framework. The key ideas are (i) constraint-based score guidance that perturbs the base score with the gradient of a soft constraint, (ii) approximations for the constrained score including multi-instance constraints and Langevin corrections, and (iii) a neuro-symbolic logic LTN-based soft constraint language with atomic predicates and Boolean connectives. The method is demonstrated on tabular data, time series, and image tasks, showing improved approximation to the true conditional distribution compared to rejection sampling and universal guidance, particularly for tabular and time-series data.

Abstract

Score-based diffusion models have emerged as effective approaches for both conditional and unconditional generation. Still conditional generation is based on either a specific training of a conditional model or classifier guidance, which requires training a noise-dependent classifier, even when a classifier for uncorrupted data is given. We propose a method that, given a pre-trained unconditional score-based generative model, samples from the conditional distribution under arbitrary logical constraints, without requiring additional training. Differently from other zero-shot techniques, that rather aim at generating valid conditional samples, our method is designed for approximating the true conditional distribution. Firstly, we show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint. Then, we define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints. Combining these two ingredients we obtain a general, but approximate, conditional sampling algorithm. We further developed effective heuristics aimed at improving the approximation. Finally, we show the effectiveness of our approach in approximating conditional distributions for various types of constraints and data: tabular data, images and time series.

Zero-Shot Conditioning of Score-Based Diffusion Models by Neuro-Symbolic Constraints

TL;DR

Score-based diffusion models enable conditional generation but typically rely on retraining a conditional model or classifier guidance. This work introduces a zero-shot conditioning method that reuses a pre-trained unconditional score-based model to sample from p(x|constraint) by modifying the score with differentiable soft constraints encoded via a neuro-symbolic framework. The key ideas are (i) constraint-based score guidance that perturbs the base score with the gradient of a soft constraint, (ii) approximations for the constrained score including multi-instance constraints and Langevin corrections, and (iii) a neuro-symbolic logic LTN-based soft constraint language with atomic predicates and Boolean connectives. The method is demonstrated on tabular data, time series, and image tasks, showing improved approximation to the true conditional distribution compared to rejection sampling and universal guidance, particularly for tabular and time-series data.

Abstract

Score-based diffusion models have emerged as effective approaches for both conditional and unconditional generation. Still conditional generation is based on either a specific training of a conditional model or classifier guidance, which requires training a noise-dependent classifier, even when a classifier for uncorrupted data is given. We propose a method that, given a pre-trained unconditional score-based generative model, samples from the conditional distribution under arbitrary logical constraints, without requiring additional training. Differently from other zero-shot techniques, that rather aim at generating valid conditional samples, our method is designed for approximating the true conditional distribution. Firstly, we show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint. Then, we define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints. Combining these two ingredients we obtain a general, but approximate, conditional sampling algorithm. We further developed effective heuristics aimed at improving the approximation. Finally, we show the effectiveness of our approach in approximating conditional distributions for various types of constraints and data: tabular data, images and time series.
Paper Structure (39 sections, 19 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 39 sections, 19 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Marginals of white wine data experiment. We generated 5000 samples using our constrained sampling algorithm and as many by RS. The plot compares the marginals of the dimensions directly involved in the constraint. The last two dimensions are the ones with the largest l1 histogram distance with respect to RS marginals: $0.15$ and $0.13$. While the median distance across all dimensions is $\approx 0.1$. In order to evaluate the noise of the distance, we also measured the self-distance between two equally sized samples obtained by RS and observed a median across dimensions of $\approx 0.05$.
  • Figure 2: Bridging with eSIRS time series. We show here a subsample of the $5000$ time series generated with constraint guidance (orange) with a subsample of the $>50000$ time series generated by RS from the simulator (green). Additionally to the consistency constraints, that are always met, we imposed the following equalities: $S(0)=95$, $I(0)=5$, $S(25)=30$. Constraints are generally met: the average l1 absolute difference with all three target values is below $0.19$. The l1 histogram distance for each time step marginal is relatively small, considering that it accounts also for the error of the unconditional model: for $S$ and $I$ the median l1 histogram distance across time are $\approx 0.11$ and $\approx 0.13$. The median self distance across time between two samples of 5000 instances of RS was $\approx 0.04$ for both $S$ and $I$.
  • Figure 3: Imposing an inequality on eSIRS time series. We show here a subsample of the 100 time series generated with and without constraint guidance. Additionally to the consistency constraints, that are always met, we imposed $\forall t \ I(t) \leq 20$, that is perfectly met in $99\%$ of the samples.
  • Figure 4: Restoration experiments with CelebA. The first image on the left is a sample image from the CelebA dataset. Each row shows the corrupted image followed by samples generated imposing the restoration constraint.
  • Figure 8: MNIST samples generated imposing a horizontal symmetry constraint. In many cases ($\approx 30\%$) the generated image did not correspond to an actual digit, this is probably due to the unconditional model, that also generated a significant amount of invalid digits ($\approx 10\%$).
  • ...and 4 more figures