Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models
Zalan Fabian, Berk Tinaz, Mahdi Soltanolkotabi
TL;DR
The paper tackles the problem of sample-dependent difficulty in inverse imaging tasks by introducing severity encoding in a latent autoencoder space to estimate both a latent reconstruction $\hat{\mathbf{z}}$ and a degradation severity $\hat{\sigma}$. It then couples this with a sample-adaptive latent-diffusion inference that selects the starting diffusion time via SNR matching to $\hat{\sigma}$, forming the Flash-Diffusion wrapper that can augment any latent-diffusion-based solver. The key contributions are a fast, trainable severity encoder with a joint latent and error-based loss, a principled starting-time selection rule, and extensive experiments showing 8–10x reductions in reverse-diffusion steps while improving reconstruction quality across Gaussian blur, nonlinear blur, and inpainting on CelebA-HQ, FFHQ, and LSUN Bedrooms. The results demonstrate substantial practical benefits for resource-aware inverse-problem solving, including notable speedups without sacrificing perceptual or distortion metrics. Overall, Flash-Diffusion provides a flexible framework to dynamically allocate compute based on sample difficulty in diffusion-based inverse problem solvers, advancing both efficiency and accuracy in high-resolution vision tasks.
Abstract
Inverse problems arise in a multitude of applications, where the goal is to recover a clean signal from noisy and possibly (non)linear observations. The difficulty of a reconstruction problem depends on multiple factors, such as the ground truth signal structure, the severity of the degradation and the complex interactions between the above. This results in natural sample-by-sample variation in the difficulty of a reconstruction problem. Our key observation is that most existing inverse problem solvers lack the ability to adapt their compute power to the difficulty of the reconstruction task, resulting in subpar performance and wasteful resource allocation. We propose a novel method, $\textit{severity encoding}$, to estimate the degradation severity of corrupted signals in the latent space of an autoencoder. We show that the estimated severity has strong correlation with the true corruption level and can provide useful hints on the difficulty of reconstruction problems on a sample-by-sample basis. Furthermore, we propose a reconstruction method based on latent diffusion models that leverages the predicted degradation severities to fine-tune the reverse diffusion sampling trajectory and thus achieve sample-adaptive inference times. Our framework, Flash-Diffusion, acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration. We perform experiments on both linear and nonlinear inverse problems and demonstrate that our technique greatly improves the performance of the baseline solver and achieves up to $10\times$ acceleration in mean sampling speed.
