Invertible Diffusion Models for Compressed Sensing

Bin Chen; Zhenyu Zhang; Weiqi Li; Chen Zhao; Jiwen Yu; Shijie Zhao; Jie Chen; Jian Zhang

Invertible Diffusion Models for Compressed Sensing

Bin Chen, Zhenyu Zhang, Weiqi Li, Chen Zhao, Jiwen Yu, Shijie Zhao, Jie Chen, Jian Zhang

TL;DR

CS aims to recover $x$ from measurements $y = A x$ with CS ratio $\gamma = M/N$. IDM introduces an end-to-end diffusion-based CS framework that fine-tunes a pre-trained diffusion sampler to learn the mapping from $y$ to $x$, aided by a two-level invertible design and injectors that fuse the physics $(y, A)$ into feature space. It achieves state-of-the-art PSNR gains over both CS nets and diffusion-based solvers, while dramatically reducing memory use and accelerating inference. The approach demonstrates strong performance across natural image CS, inpainting, accelerated MRI, and sparse-view CT, and shows notable generalization to unseen CS ratios, highlighting practical impact for resource-constrained deployments.

Abstract

While deep neural networks (NN) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch constrains their effectiveness and hampers rapid deployment. Although recent methods utilize pre-trained diffusion models for image reconstruction, they struggle with slow inference and restricted adaptability to CS. To tackle these challenges, this paper proposes Invertible Diffusion Models (IDM), a novel efficient, end-to-end diffusion-based CS method. IDM repurposes a large-scale diffusion sampling process as a reconstruction model, and fine-tunes it end-to-end to recover original images directly from CS measurements, moving beyond the traditional paradigm of one-step noise estimation learning. To enable such memory-intensive end-to-end fine-tuning, we propose a novel two-level invertible design to transform both (1) multi-step sampling process and (2) noise estimation U-Net in each step into invertible networks. As a result, most intermediate features are cleared during training to reduce up to 93.8% GPU memory. In addition, we develop a set of lightweight modules to inject measurements into noise estimator to further facilitate reconstruction. Experiments demonstrate that IDM outperforms existing state-of-the-art CS networks by up to 2.64dB in PSNR. Compared to the recent diffusion-based approach DDNM, our IDM achieves up to 10.09dB PSNR gain and 14.54 times faster inference. Code is available at https://github.com/Guaishou74851/IDM.

Invertible Diffusion Models for Compressed Sensing

TL;DR

CS aims to recover

from measurements

with CS ratio

. IDM introduces an end-to-end diffusion-based CS framework that fine-tunes a pre-trained diffusion sampler to learn the mapping from

, aided by a two-level invertible design and injectors that fuse the physics

into feature space. It achieves state-of-the-art PSNR gains over both CS nets and diffusion-based solvers, while dramatically reducing memory use and accelerating inference. The approach demonstrates strong performance across natural image CS, inpainting, accelerated MRI, and sparse-view CT, and shows notable generalization to unseen CS ratios, highlighting practical impact for resource-constrained deployments.

Abstract

Paper Structure (20 sections, 3 equations, 11 figures, 7 tables)

This paper contains 20 sections, 3 equations, 11 figures, 7 tables.

Introduction
Related Work
Deep End-to-End Learned Image CS Networks
Diffusion-Based Image Reconstruction
Invertible Neural Networks for Vision Tasks
Method
Preliminary
Learn Diffusion Sampling End-to-End for CS
Two-Level Invertible Design for Memory Efficiency
Inject Measurement Physics into Noise Estimator
Discussion
Relationship with Previous Diffusion-Based Methods
Relationship with Deep Algorithm Unrolling
Experiment
Implementation Details
...and 5 more sections

Figures (11)

Figure 1: Proposed IDM compared to previous methods.(a) Conventional NN-based works chen2023deep develop and train new CS NN architectures from scratch, limiting their ability to achieve higher performance within a short timeframe for rapid deployment. (b) Traditional diffusion-based image reconstruction methods saharia2023image train a one-step noise estimation U-Net and use it as an off-the-shelf NN module for iterative sampling. This estimator lacks awareness of the entire recovery process from measurement to image, reducing its adaptability to CS. (c) Our invertible diffusion models (IDM) fine-tune a large-scale, pre-trained diffusion sampling process to directly predict original images from CS measurements end-to-end, significantly improving performance while reducing the required sampling steps (Contribution 1). We further make the sampling process and noise estimation U-Net invertible, adding measurement injectors into our pruned noise estimation U-Net, resulting in a substantial performance boost while greatly reducing training GPU memory and runtime (Contributions 2 and 3). Here, (a), (b), and (c) correspond to (12), (1), and (9) in Tab. \ref{['tab:abla']}, respectively. Please refer to Sec. \ref{['sec:ablation_and_analysis']} for more details.
Figure 2: Illustration of our proposed IDM framework. It receives an initial image estimate ${\hat{\mathbf{x}}}_T$ and learns $T$ diffusion sampling steps for end-to-end recovery. Auxiliary connections (shown as red arrows) enable invertibility and facilitate the reuse of powerful large-scale pre-trained SD models.
Figure 3: Illustration of our wiring technique, exemplified with three diffusion sampling steps and a three-scale noise estimation U-Net. Here, light-colored rectangles with diagonal dashed lines represent images/features that are cleared, while dark-colored, empty rectangles indicate images/features that must be preserved. (a) The original non-invertible forward pass caches all inputs, features, and outputs, causing memory usage to increase linearly with the step number. (b) We add connections to construct invertible layers, reducing memory usage to a constant level. (c) During back-propagation, the necessary intermediate inputs/features are sequentially recomputed and cleared to obtain gradients from the last to the first step. Our wired sampling framework is equivalent to the original setup in (a) when $u_t = 1$ and $v_t = 0$.
Figure 4: Illustration of our modified noise estimation U-Net ${\boldsymbol{\epsilon}}_{\mathbf{\Theta}}$ for image CS reconstruction tasks, based on the SD v1.5 models rombach2022highstabilityai.(a) Injectors are added behind each residual and attention block and grouped within downsampling, upsampling, and middle blocks (marked as DB, UB, and MB) for invertibility. Our method is orthogonal to and compatible with network pruning kim2023bk for enhanced efficiency. (b) Each injector learns to fuse measurement physics $\{{\mathbf{y}},{\mathbf{A}}\}$ into deep features using convolutions and residual connections.
Figure 5: Comparison of CS recovery results among various end-to-end learned methods on "test_03" image from CBSD68 at $\gamma =10\%$.
...and 6 more figures

Invertible Diffusion Models for Compressed Sensing

TL;DR

Abstract

Invertible Diffusion Models for Compressed Sensing

Authors

TL;DR

Abstract

Table of Contents

Figures (11)