Table of Contents
Fetching ...

Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models

Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi

TL;DR

The paper tackles the problem of sample-dependent difficulty in inverse imaging tasks by introducing severity encoding in a latent autoencoder space to estimate both a latent reconstruction $\hat{\mathbf{z}}$ and a degradation severity $\hat{\sigma}$. It then couples this with a sample-adaptive latent-diffusion inference that selects the starting diffusion time via SNR matching to $\hat{\sigma}$, forming the Flash-Diffusion wrapper that can augment any latent-diffusion-based solver. The key contributions are a fast, trainable severity encoder with a joint latent and error-based loss, a principled starting-time selection rule, and extensive experiments showing 8–10x reductions in reverse-diffusion steps while improving reconstruction quality across Gaussian blur, nonlinear blur, and inpainting on CelebA-HQ, FFHQ, and LSUN Bedrooms. The results demonstrate substantial practical benefits for resource-aware inverse-problem solving, including notable speedups without sacrificing perceptual or distortion metrics. Overall, Flash-Diffusion provides a flexible framework to dynamically allocate compute based on sample difficulty in diffusion-based inverse problem solvers, advancing both efficiency and accuracy in high-resolution vision tasks.

Abstract

Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the ground truth signal structure, the severity of the degradation and the complex interactions between the above. This results in natural sample-by-sample variation in the difficulty of a reconstruction problem. Our key observation is that most existing inverse problem solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in subpar performance and wasteful resource allocation. We propose a novel method, $\textit{severity encoding}$, to estimate the degradation severity of corrupted signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can provide useful hints on the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. Our framework, Flash-Diffusion, acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration. We perform experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to $10\times$ acceleration in mean sampling speed.

Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models

TL;DR

The paper tackles the problem of sample-dependent difficulty in inverse imaging tasks by introducing severity encoding in a latent autoencoder space to estimate both a latent reconstruction and a degradation severity . It then couples this with a sample-adaptive latent-diffusion inference that selects the starting diffusion time via SNR matching to , forming the Flash-Diffusion wrapper that can augment any latent-diffusion-based solver. The key contributions are a fast, trainable severity encoder with a joint latent and error-based loss, a principled starting-time selection rule, and extensive experiments showing 8–10x reductions in reverse-diffusion steps while improving reconstruction quality across Gaussian blur, nonlinear blur, and inpainting on CelebA-HQ, FFHQ, and LSUN Bedrooms. The results demonstrate substantial practical benefits for resource-aware inverse-problem solving, including notable speedups without sacrificing perceptual or distortion metrics. Overall, Flash-Diffusion provides a flexible framework to dynamically allocate compute based on sample difficulty in diffusion-based inverse problem solvers, advancing both efficiency and accuracy in high-resolution vision tasks.

Abstract

Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the ground truth signal structure, the severity of the degradation and the complex interactions between the above. This results in natural sample-by-sample variation in the difficulty of a reconstruction problem. Our key observation is that most existing inverse problem solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in subpar performance and wasteful resource allocation. We propose a novel method, , to estimate the degradation severity of corrupted signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can provide useful hints on the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. Our framework, Flash-Diffusion, acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration. We perform experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to acceleration in mean sampling speed.
Paper Structure (19 sections, 33 equations, 21 figures, 9 tables)

This paper contains 19 sections, 33 equations, 21 figures, 9 tables.

Figures (21)

  • Figure 1: Overview of our method: we estimate the degradation severity of corrupted images in the latent space of an autoencoder (Severity Encoder). We leverage the severity predictions ($\hat{\sigma}$) to find the optimal start time in a latent reverse diffusion process on a sample-by-sample basis. As a result, inference cost is automatically scaled by the difficulty of the reconstruction task at test time.
  • Figure 2: The optimal number of reverse diffusion steps varies depending on the severity of degradations. Fixing the number of steps results in over-diffusing some samples, whereas others could benefit from more iterations.
  • Figure 3: Effect of degradation on predicted severities: given a ground truth image corrupted by varying amount of blur, $\hat{\sigma}$ is a non-decreasing function of the blur amount.
  • Figure 4: Blur amount ($t$) vs. predicted degradation severity ($\hat{\sigma}$). Outliers indicate that the predicted degradation severity is not solely determined by the amount of blur. The bottom image is surprisingly easy to reconstruct, as it is overwhelmingly smooth with features close to those seen in the training set. The top image is surprisingly hard, due to more high-frequency details and unusual features not seen during training. Points in red suggest that a given degradation severity may result from a wide range of blur levels (see Fig. \ref{['fig:contributors']} and discussion under Identifying contributors to severity in Section \ref{['sec:sev_enc']}).
  • Figure 5: Contributors to severity. Degraded images with approx. the same $\hat{\sigma}$ may have different factors contributing to the predicted severity. The main contributor to $\hat{\sigma}$ in the top image is the image degradation (blur), whereas the bottom image is inherently more difficult to reconstruct.
  • ...and 16 more figures

Theorems & Definitions (1)

  • Definition 1: Ordering accuracy