Table of Contents
Fetching ...

Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve

Pedro Pessoa, Max Schweiger, Lance W. Q. Xu, Tristan Manha, Ayush Saurabh, Julian Antolin Camarena, Steve Pressé

TL;DR

The study tackles recovering the distribution of a hidden stochastic component $b$ from observations $x=f(a,b)$ with a known $a$-distribution, avoiding noise amplification from subtraction or division. It compares three approaches: Bayesian inference with a known $p_B$, Bayesian inference with a Gaussian-mixture prior over $p_B$, and NFdeconvolve, which uses normalizing flows to model $p_{NF}(b|\phi)$ and infer $p(b|\{x\},\theta_A)$ via the convolution likelihood. Results on synthetic sum and product data show that NFdeconvolve is robust to model misspecification and often yields closer-to-ground-truth distributions (lower KL divergence) than Gaussian mixtures, especially with limited data or lower SNR, while a correctly specified Bayesian model remains superior when available. NFdeconvolve is implemented in PyTorch and released on GitHub with tutorials, enabling practitioners to perform deconvolution of stochastic signals in applications like background subtraction and illumination correction without explicit subtraction/division. Overall, the work demonstrates that normalizing flows provide a flexible and reliable path to deconvolving stochastic signals under uncertainty about the deconvolved distribution.

Abstract

Across the scientific realm, we find ourselves subtracting or dividing stochastic signals. For instance, consider a stochastic realization, $x$, generated from the addition or multiplication of two stochastic signals $a$ and $b$, namely $x=a+b$ or $x = ab$. For the $x=a+b$ example, $a$ can be fluorescence background and $b$ the signal of interest whose statistics are to be learned from the measured $x$. Similarly, when writing $x=ab$, $a$ can be thought of as the illumination intensity and $b$ the density of fluorescent molecules of interest. Yet dividing or subtracting stochastic signals amplifies noise, and we ask instead whether, using the statistics of $a$ and the measurement of $x$ as input, we can recover the statistics of $b$. Here, we show how normalizing flows can generate an approximation of the probability distribution over $b$, thereby avoiding subtraction or division altogether. This method is implemented in our software package, NFdeconvolve, available on GitHub with a tutorial linked in the main text.

Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve

TL;DR

The study tackles recovering the distribution of a hidden stochastic component from observations with a known -distribution, avoiding noise amplification from subtraction or division. It compares three approaches: Bayesian inference with a known , Bayesian inference with a Gaussian-mixture prior over , and NFdeconvolve, which uses normalizing flows to model and infer via the convolution likelihood. Results on synthetic sum and product data show that NFdeconvolve is robust to model misspecification and often yields closer-to-ground-truth distributions (lower KL divergence) than Gaussian mixtures, especially with limited data or lower SNR, while a correctly specified Bayesian model remains superior when available. NFdeconvolve is implemented in PyTorch and released on GitHub with tutorials, enabling practitioners to perform deconvolution of stochastic signals in applications like background subtraction and illumination correction without explicit subtraction/division. Overall, the work demonstrates that normalizing flows provide a flexible and reliable path to deconvolving stochastic signals under uncertainty about the deconvolved distribution.

Abstract

Across the scientific realm, we find ourselves subtracting or dividing stochastic signals. For instance, consider a stochastic realization, , generated from the addition or multiplication of two stochastic signals and , namely or . For the example, can be fluorescence background and the signal of interest whose statistics are to be learned from the measured . Similarly, when writing , can be thought of as the illumination intensity and the density of fluorescent molecules of interest. Yet dividing or subtracting stochastic signals amplifies noise, and we ask instead whether, using the statistics of and the measurement of as input, we can recover the statistics of . Here, we show how normalizing flows can generate an approximation of the probability distribution over , thereby avoiding subtraction or division altogether. This method is implemented in our software package, NFdeconvolve, available on GitHub with a tutorial linked in the main text.
Paper Structure (20 sections, 59 equations, 6 figures, 1 table)

This paper contains 20 sections, 59 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Summary of NFdeconvolve. The data formation process is assumed to involve two components: The noise, $a$, whose distribution is known and the signal, $b$, whose ground truth distribution is unknown. The observed data, $x$, is a convolution of these two components (in the example above, it is the sum $x = a + b$). NFdeconvolve receives the combined data $\{x\}$, representing a set of observations of $x$ and the distribution of $a$ (but not the individual realizations of $a$'s) and produces an estimate of the distribution of $b$. For this example, whose detailed implementation can be seen in our GitHub github, the resultant distribution of $b$ obtained by NFdeconvolve is the one presented in red.
  • Figure 2: Comparison of NFdeconvolve and a Gaussian mixture model for obtaining the deconvolved distribution in two scenarios. The top row shows the true signal distribution $b$ and the observed data $x$ in two scenarios: on the left, $b$ is generated from a mixture of Gaussians; on the right, from a mixture of Gamma and inverted Gamma distributions. The bottom row presents the deconvolved distributions obtained using both a Bayesian approach with a Gaussian mixture model and NFdeconvolve. On the left, where the data matches the Gaussian mixture assumption, both methods perform well, with the Gaussian mixture model achieving a more precise match. However, on the right, where the data distribution does not align with the Gaussian model, NFdeconvolve significantly outperforms the Gaussian mixture approach. Further quantitative comparisons will be presented Sec. \ref{['sec:results']}.
  • Figure 3: Distributions obtained by the solution methods for the sum of two random variables example. Here $b$ is sampled from a ground truth Gamma distribution, as in \ref{['Gamma']}, with parameters $\alpha_B = 9$ and $\beta_B = 1$, while $a$ is sampled from a Gaussian with mean $\mu_A = 10$ and variance $\sigma_A^2 = 1$. In each row, we change the number of data points used and show the distributions obtained by each method. In the two Bayesian methods, we show both the MAP and the reconstruction. As expected, the Bayesian method with the known model finds the correct distribution with fewer data points. For the other methods, we see that the Gaussian mixture presents some overfitting, represented by the "peaks" in the corresponding column, while the normalizing flows approach the ground truth distribution in a smoother way. Later, we will quantitatively confirm this result.
  • Figure 4: Divergence between the ground truth distribution and the distributions obtained by each method in the sum of two random variables example measured by the logarithm of the KL divergence. Each square within the figure was obtained with a synthetic dataset where $b$ is Gamma distributed with parameter $\beta_B = 1$ and the other parameter $\alpha_B$ is changed to generate datasets with different SNR. In all cases, $a$ is sampled from a Gaussian with mean $\mu_A = 10$ and variance $\sigma_A^2 = 1$. As expected, we see smaller KL divergence values (darker colors) for larger SNR and data sizes. The Bayesian method, along with the known model, is able to obtain a distribution much closer to the ground truth than all others. However, when the model is unknown, the normalizing flows generally obtain distributions with smaller KL divergence. This confirms our result in Fig. \ref{['fig:sum']} that normalizing flows are able to better approximate the unknown distribution than the Gaussian mixture by avoiding overfitting.
  • Figure 5: Distributions obtained by the solution methods for the product of two random variables example. Here $b$ is sampled from a ground truth Gamma distribution, as in \ref{['Gamma']}, with parameters $\alpha_B = 9$ and $\beta_B = 1$, while $a$ is sampled from a Gaussian with mean $\mu_A = 10$ and variance $\sigma_A^2 = 1$. In each row, we change the number of data points used and show the distributions obtained by each method. Consistently with Fig. \ref{['fig:sum']}, the Bayesian method with the known model finds the correct distribution with fewer data points. Similarly, among the other methods that do not require knowing the model, the normalizing flows method avoids the overfitting seen in the Gaussian mixture.
  • ...and 1 more figures