Table of Contents
Fetching ...

Uformer-ICS: A U-Shaped Transformer for Image Compressive Sensing Service

Kuiyuan Zhang, Zhongyun Hua, Yuanman Li, Yushu Zhang, Yicong Zhou

TL;DR

This work addresses real-time image compressive sensing by integrating CS priors into a U-shaped transformer. It introduces an adaptive sampling mechanism that estimates block sparsity from initial measurements to allocate per-block sampling resources, and a multi-channel projection (MCP) module that injects CS projection knowledge into a projection-based transformer block. The reconstruction network combines a four-level Uformer with MCP and residual convolutions to capture both local and long-range dependencies, achieving state-of-the-art PSNR/SSIM across five datasets and enabling scalable, one-model-for-arbitrary-sampling performance. The results demonstrate significant improvements over existing DL-based CS methods and highlight the practical potential for bandwidth- and storage-efficient image sensing in service-oriented applications.

Abstract

Many service computing applications require real-time dataset collection from multiple devices, necessitating efficient sampling techniques to reduce bandwidth and storage pressure. Compressive sensing (CS) has found wide-ranging applications in image acquisition and reconstruction. Recently, numerous deep-learning methods have been introduced for CS tasks. However, the accurate reconstruction of images from measurements remains a significant challenge, especially at low sampling rates. In this paper, we propose Uformer-ICS as a novel U-shaped transformer for image CS tasks by introducing inner characteristics of CS into transformer architecture. To utilize the uneven sparsity distribution of image blocks, we design an adaptive sampling architecture that allocates measurement resources based on the estimated block sparsity, allowing the compressed results to retain maximum information from the original image. Additionally, we introduce a multi-channel projection (MCP) module inspired by traditional CS optimization methods. By integrating the MCP module into the transformer blocks, we construct projection-based transformer blocks, and then form a symmetrical reconstruction model using these blocks and residual convolutional blocks. Therefore, our reconstruction model can simultaneously utilize the local features and long-range dependencies of image, and the prior projection knowledge of CS theory. Experimental results demonstrate its significantly better reconstruction performance than state-of-the-art deep learning-based CS methods.

Uformer-ICS: A U-Shaped Transformer for Image Compressive Sensing Service

TL;DR

This work addresses real-time image compressive sensing by integrating CS priors into a U-shaped transformer. It introduces an adaptive sampling mechanism that estimates block sparsity from initial measurements to allocate per-block sampling resources, and a multi-channel projection (MCP) module that injects CS projection knowledge into a projection-based transformer block. The reconstruction network combines a four-level Uformer with MCP and residual convolutions to capture both local and long-range dependencies, achieving state-of-the-art PSNR/SSIM across five datasets and enabling scalable, one-model-for-arbitrary-sampling performance. The results demonstrate significant improvements over existing DL-based CS methods and highlight the practical potential for bandwidth- and storage-efficient image sensing in service-oriented applications.

Abstract

Many service computing applications require real-time dataset collection from multiple devices, necessitating efficient sampling techniques to reduce bandwidth and storage pressure. Compressive sensing (CS) has found wide-ranging applications in image acquisition and reconstruction. Recently, numerous deep-learning methods have been introduced for CS tasks. However, the accurate reconstruction of images from measurements remains a significant challenge, especially at low sampling rates. In this paper, we propose Uformer-ICS as a novel U-shaped transformer for image CS tasks by introducing inner characteristics of CS into transformer architecture. To utilize the uneven sparsity distribution of image blocks, we design an adaptive sampling architecture that allocates measurement resources based on the estimated block sparsity, allowing the compressed results to retain maximum information from the original image. Additionally, we introduce a multi-channel projection (MCP) module inspired by traditional CS optimization methods. By integrating the MCP module into the transformer blocks, we construct projection-based transformer blocks, and then form a symmetrical reconstruction model using these blocks and residual convolutional blocks. Therefore, our reconstruction model can simultaneously utilize the local features and long-range dependencies of image, and the prior projection knowledge of CS theory. Experimental results demonstrate its significantly better reconstruction performance than state-of-the-art deep learning-based CS methods.
Paper Structure (37 sections, 17 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 37 sections, 17 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Average reconstruction performances of the proposed and existing state-of-the-art deep learning-based CS methods. The peak signal-to-noise ratio (PSNR) scores shown are averaged over all images in the five test datasets: Set5, Set11, Set14, BSD100, and Urban100. It is evident that the proposed method achieves significantly better PSNR scores than state-of-the-art deep learning-based CS methods.
  • Figure 2: Adaptive sampling model of the proposed Uformer-ICS. First, the image $\mathbf{X}$ is initially sampled, and the initial measurements are utilized to estimate the block sparsity, which is used to adaptively allocate sampling resources for each block. Then, the image is further adaptively sampled block-by-block. The final measurements are obtained by concatenating the initial measurements and adaptive measurements.
  • Figure 3: Measurement allocations using three sparsity estimation methods for the "Parrots" image at the sampling ratio of 0.1.
  • Figure 4: Overview of the reconstruction model of the proposed Uformer-ICS. Given the adaptive sampling result $\mathbf{Y}$, the reconstruction model first applies linear mapping on it and employs pixel shuffle operation to transform the combined mapping results into an image-like initialization $\mathbf{X}_0$. Then, the reconstruction model feeds the obtained initialization $\mathbf{X}_0$ into the Head module for extracting shallow features $\mathbf{X}_h$. Taking $\mathbf{X}_h$ as input, the Uformer module captures the long-range dependencies to enhance the feature representation and outputs $\mathbf{X}_u$. Finally, the Tail module generates the final reconstruction result $\hat{\mathbf{X}}$ by adding the initialization $\mathbf{X}_0$ and aggregation features of $\mathbf{X}_u$.
  • Figure 5: Illustrations of the feature down-sampling and up-sampling operations.
  • ...and 3 more figures