Table of Contents
Fetching ...

Sampling Binary Data by Denoising through Score Functions

Francis Bach, Saeed Saremi

TL;DR

The paper addresses sampling binary data on the Boolean hypercube by replacing Gaussian smoothing with Bernoulli noise smoothing and extending the Tweedie-Miyasawa score-based denoising framework to binary data. It derives a TMF-like relation where the optimal binary denoiser ties to the score of the noisy distribution through $\mathbb{E}[x|y]=\frac{1}{\alpha}\nabla\log q_\alpha(y)$ and learns these score functions via logistic-denoiser objectives. It then develops both one-stage and two-stage discrete Langevin samplers, provides contraction and stationary-distribution guarantees, and extends to multi-measurement settings to reduce effective noise to $m\alpha$, with empirical validation on synthetic mixtures and binarized MNIST. The approach yields efficient sampling in high-noise regimes, preserves the binary nature of the data, and offers a flexible framework for binary data generation and denoising with rigorous analytic guarantees. Potential extensions include applying to other exponential-family noises, sharper priors, and faster sampling via Metropolis-Hastings refinements.

Abstract

Gaussian smoothing combined with a probabilistic framework for denoising via the empirical Bayes formalism, i.e., the Tweedie-Miyasawa formula (TMF), are the two key ingredients in the success of score-based generative models in Euclidean spaces. Smoothing holds the key for easing the problem of learning and sampling in high dimensions, denoising is needed for recovering the original signal, and TMF ties these together via the score function of noisy data. In this work, we extend this paradigm to the problem of learning and sampling the distribution of binary data on the Boolean hypercube by adopting Bernoulli noise, instead of Gaussian noise, as a smoothing device. We first derive a TMF-like expression for the optimal denoiser for the Hamming loss, where a score function naturally appears. Sampling noisy binary data is then achieved using a Langevin-like sampler which we theoretically analyze for different noise levels. At high Bernoulli noise levels sampling becomes easy, akin to log-concave sampling in Euclidean spaces. In addition, we extend the sequential multi-measurement sampling of Saremi et al. (2024) to the binary setting where we can bring the "effective noise" down by sampling multiple noisy measurements at a fixed noise level, without the need for continuous-time stochastic processes. We validate our formalism and theoretical findings by experiments on synthetic data and binarized images.

Sampling Binary Data by Denoising through Score Functions

TL;DR

The paper addresses sampling binary data on the Boolean hypercube by replacing Gaussian smoothing with Bernoulli noise smoothing and extending the Tweedie-Miyasawa score-based denoising framework to binary data. It derives a TMF-like relation where the optimal binary denoiser ties to the score of the noisy distribution through and learns these score functions via logistic-denoiser objectives. It then develops both one-stage and two-stage discrete Langevin samplers, provides contraction and stationary-distribution guarantees, and extends to multi-measurement settings to reduce effective noise to , with empirical validation on synthetic mixtures and binarized MNIST. The approach yields efficient sampling in high-noise regimes, preserves the binary nature of the data, and offers a flexible framework for binary data generation and denoising with rigorous analytic guarantees. Potential extensions include applying to other exponential-family noises, sharper priors, and faster sampling via Metropolis-Hastings refinements.

Abstract

Gaussian smoothing combined with a probabilistic framework for denoising via the empirical Bayes formalism, i.e., the Tweedie-Miyasawa formula (TMF), are the two key ingredients in the success of score-based generative models in Euclidean spaces. Smoothing holds the key for easing the problem of learning and sampling in high dimensions, denoising is needed for recovering the original signal, and TMF ties these together via the score function of noisy data. In this work, we extend this paradigm to the problem of learning and sampling the distribution of binary data on the Boolean hypercube by adopting Bernoulli noise, instead of Gaussian noise, as a smoothing device. We first derive a TMF-like expression for the optimal denoiser for the Hamming loss, where a score function naturally appears. Sampling noisy binary data is then achieved using a Langevin-like sampler which we theoretically analyze for different noise levels. At high Bernoulli noise levels sampling becomes easy, akin to log-concave sampling in Euclidean spaces. In addition, we extend the sequential multi-measurement sampling of Saremi et al. (2024) to the binary setting where we can bring the "effective noise" down by sampling multiple noisy measurements at a fixed noise level, without the need for continuous-time stochastic processes. We validate our formalism and theoretical findings by experiments on synthetic data and binarized images.

Paper Structure

This paper contains 28 sections, 8 theorems, 82 equations, 6 figures.

Key Result

Lemma 2.1

Given a joint distribution on $(x,y)$, the function $f:\{-1,1\}^d \to \{-1,1\}^d$ that minimizes ${\mathbb E} [ \ell(x,f(y))]$ is $f(y) = \mathop{ \rm sign}( {\mathbb E}[ x|y ])$.

Figures (6)

  • Figure 1: Optimal denoising from strong priors (large $\beta$) to weak priors (small $\beta$): comparison between Wasserstein distance and mean-square-error of denoising performance.
  • Figure 2: Optimal denoising from multiple measurements, for $m=1,3,5$ (one curve per $m$), for $d=6$, and three values of $\beta$, from strong priors (large $\beta$) to weak priors (small $\beta$).
  • Figure 3: Comparison of 1-stage and 2-stage Langevin sampling. Top: distance to desired distribution $W(y,y_{\rm stat})$, bottom: mixing time (in log scale).
  • Figure 4: Comparison of 1-stage and 2-stage Langevin sampling. Top: distance to desired distribution $W(y,y_{\rm stat})$, bottom: denoising performance of the stationary distributions, measured in Wasserstein distance.
  • Figure 5: The denoising performance on binarized MNIST at two high Bernoulli noise levels ($\alpha=0.25$, and $\alpha=0.5$).
  • ...and 1 more figures

Theorems & Definitions (14)

  • Lemma 2.1: Optimal denoiser
  • proof
  • Lemma 2.2: Denoising through score functions
  • Lemma 2.3: Denoising performance
  • proof
  • Lemma 2.4: Denoising performance, multiple measurements
  • Proposition 3.1: Contractivity
  • Proposition 3.2: Distance to stationary distribution
  • Proposition 3.3: Contractivity, two-stage sampler
  • Proposition 3.4: Distance to stationary distribution
  • ...and 4 more