Table of Contents
Fetching ...

Latent Generative Models with Tunable Complexity for Compressed Sensing and other Inverse Problems

Sean Gunn, Jorio Cocola, Oliver De Candido, Vaggos Chatziafratis, Paul Hand

TL;DR

Tunable-complexity generative priors are developed for diffusion models, normalizing flows, and variational autoencoders, leveraging nested dropout and a theoretical analysis that explicitly characterizes how the optimal tuning parameter depends on noise and model structure.

Abstract

Generative models have emerged as powerful priors for solving inverse problems. These models typically represent a class of natural signals using a single fixed complexity or dimensionality. This can be limiting: depending on the problem, a fixed complexity may result in high representation error if too small, or overfitting to noise if too large. We develop tunable-complexity priors for diffusion models, normalizing flows, and variational autoencoders, leveraging nested dropout. Across tasks including compressed sensing, inpainting, denoising, and phase retrieval, we show empirically that tunable priors consistently achieve lower reconstruction errors than fixed-complexity baselines. In the linear denoising setting, we provide a theoretical analysis that explicitly characterizes how the optimal tuning parameter depends on noise and model structure. This work demonstrates the potential of tunable-complexity generative priors and motivates both the development of supporting theory and their application across a wide range of inverse problems.

Latent Generative Models with Tunable Complexity for Compressed Sensing and other Inverse Problems

TL;DR

Tunable-complexity generative priors are developed for diffusion models, normalizing flows, and variational autoencoders, leveraging nested dropout and a theoretical analysis that explicitly characterizes how the optimal tuning parameter depends on noise and model structure.

Abstract

Generative models have emerged as powerful priors for solving inverse problems. These models typically represent a class of natural signals using a single fixed complexity or dimensionality. This can be limiting: depending on the problem, a fixed complexity may result in high representation error if too small, or overfitting to noise if too large. We develop tunable-complexity priors for diffusion models, normalizing flows, and variational autoencoders, leveraging nested dropout. Across tasks including compressed sensing, inpainting, denoising, and phase retrieval, we show empirically that tunable priors consistently achieve lower reconstruction errors than fixed-complexity baselines. In the linear denoising setting, we provide a theoretical analysis that explicitly characterizes how the optimal tuning parameter depends on noise and model structure. This work demonstrates the potential of tunable-complexity generative priors and motivates both the development of supporting theory and their application across a wide range of inverse problems.
Paper Structure (23 sections, 4 theorems, 37 equations, 22 figures, 2 tables, 2 algorithms)

This paper contains 23 sections, 4 theorems, 37 equations, 22 figures, 2 tables, 2 algorithms.

Key Result

Theorem 5.1

Suppose we have a family $\{ \bm{G_{k}} \}_{k = 1 \ldots n}$ of generative models as given above, and let $p_{\bm{G_{k}}} = \mathcal{N}(0,\bm{G_{k}} \bm{G_{k}}^T)$, and let $G_n \in \mathbb{R}^{n \times n}$ have singular values $s_{1} \geq s_{2} \geq \cdots \geq s_{n} >0$. Let ${\bm{x}}_{0} \sim p_{

Figures (22)

  • Figure 1: Medium-complexity priors can outperform both low- and high-complexity alternatives for image reconstruction. We trained three separate generative models with low, medium, and high latent dimensionality. The size of the boxes representing $z$ depicts the latent dimensionality of each model. We test the models on a random pixel inpainting problem. The medium-complexity prior yields the reconstruction with the highest Peak Signal-to-Noise Ratio (PSNR).
  • Figure 2: Intermediate latent dimensionalities yield the best reconstruction at low measurement ratios. We train separate injective flow models for each latent dimensionality $k$, ranging from $16$ to $456$, on MNIST images of size $n = 32 \times 32 = 1024$ pixels. No parameter sharing is used across models. Each panel shows reconstruction performance for a different number of measurements $m$: for $m < n = 1024$, the forward operator is an $m \times n$ random Gaussian matrix (compressed sensing), while $m = 1024$ corresponds to the identity operator (no compression). At small measurement ratios $m/n$, intermediate latent dimensions ($150 \leq k \leq 300$) yield the lowest reconstruction error, while the optimal $k$ shifts as the number of measurements increases (error bars indicate $\pm 1$ standard deviation).
  • Figure 3: Nested dropout training produces tunable latent diffusion models that maintain generation quality across latent dimensionalities. FID score is plotted as a function of latent dimensionality $k$ for models trained with different values of the dropout distribution parameter $p_k$. The vanilla LDM baseline (dotted line) operates only at full dimensionality. As $k$ increases, the tunable models approach baseline performance while retaining the ability to generate from lower-dimensional representations. Results are evaluated on 50k training images from each dataset. FID scores computed with Parmar_2022_CVPR.
  • Figure 4: Tunable priors outperform fixed-complexity baselines across multiple inverse problems. Reconstruction performance (LPIPS, lower is better) is shown as a function of latent dimensionality $k$ for compressed sensing, denoising, phase retrieval, and inpainting on the CelebA dataset. The tunable LDM prior (blue) is compared against a fixed-complexity baseline operating at full dimensionality (orange). For all four tasks, intermediate values of $k$ yield lower reconstruction error than both the low-complexity and high-complexity extremes, demonstrating the benefit of tuning model complexity to the inverse problem at hand.
  • Figure 5: Qualitative results across four inverse problems on FFHQ. Columns: ground truth, measurement ($\mathbf{A}^\top \mathbf{y}$ for CS/phase retrieval; degraded input for SR/deblurring), baseline, and tunable prior. CS and phase retrieval (10% measurements) use \ref{['algo:posterior']}; $4\times$ SR and Gaussian deblurring use PSLD rout2023solving. Insets show an enlarged view of the highlighted yellow boxes
  • ...and 17 more figures

Theorems & Definitions (8)

  • Theorem 5.1
  • Corollary 5.2
  • proof : Proof of \ref{['thm:mse_k']}
  • proof : Proof of Corollary \ref{['cor:low_k']}
  • Lemma C.1
  • proof : Proof of Lemma \ref{['lemma:gauss_fro']}
  • Lemma C.2
  • proof : Proof of Lemma \ref{['lemma:gauss_l2_vec']}