Table of Contents
Fetching ...

UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation

Runzhao Yang, Yinda Chen, Zhihong Zhang, Xiaoyu Liu, Zongren Li, Kunlun He, Zhiwei Xiong, Jinli Suo, Qionghai Dai

TL;DR

UniCompress tackles the challenge of efficient, large-scale medical image compression by extending INRs to represent multiple data blocks through a frequency-domain prior and a learnable codebook. The method integrates wavelet-based priors, multimodal feature fusion, and a two-stage knowledge distillation scheme to train a compact student model that preserves quality while dramatically increasing encoding speed. Theoretical and empirical results show PSNR gains on CT and EM datasets and compression-time reductions of about 4–5×, with cross-domain distillation offering additional benefits. Overall, UniCompress advances practical medical image compression by merging INR flexibility with principled prior conditioning and distillation-driven efficiency gains.

Abstract

In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multiple medical data blocks using a single INR network. By employing wavelet transforms and quantization, we introduce a codebook containing frequency domain information as a prior input to the INR network. This enhances the representational power of INR and provides distinctive conditioning for different image blocks. Furthermore, our research introduces a new technique for the knowledge distillation of implicit representations, simplifying complex model knowledge into more manageable formats to improve compression ratios. Extensive testing on CT and electron microscopy (EM) datasets has demonstrated that UniCompress outperforms traditional INR methods and commercial compression solutions like HEVC, especially in complex and high compression scenarios. Notably, compared to existing INR techniques, UniCompress achieves a 4$\sim$5 times increase in compression speed, marking a significant advancement in the field of medical image compression. Codes will be publicly available.

UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation

TL;DR

UniCompress tackles the challenge of efficient, large-scale medical image compression by extending INRs to represent multiple data blocks through a frequency-domain prior and a learnable codebook. The method integrates wavelet-based priors, multimodal feature fusion, and a two-stage knowledge distillation scheme to train a compact student model that preserves quality while dramatically increasing encoding speed. Theoretical and empirical results show PSNR gains on CT and EM datasets and compression-time reductions of about 4–5×, with cross-domain distillation offering additional benefits. Overall, UniCompress advances practical medical image compression by merging INR flexibility with principled prior conditioning and distillation-driven efficiency gains.

Abstract

In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multiple medical data blocks using a single INR network. By employing wavelet transforms and quantization, we introduce a codebook containing frequency domain information as a prior input to the INR network. This enhances the representational power of INR and provides distinctive conditioning for different image blocks. Furthermore, our research introduces a new technique for the knowledge distillation of implicit representations, simplifying complex model knowledge into more manageable formats to improve compression ratios. Extensive testing on CT and electron microscopy (EM) datasets has demonstrated that UniCompress outperforms traditional INR methods and commercial compression solutions like HEVC, especially in complex and high compression scenarios. Notably, compared to existing INR techniques, UniCompress achieves a 45 times increase in compression speed, marking a significant advancement in the field of medical image compression. Codes will be publicly available.
Paper Structure (25 sections, 1 theorem, 26 equations, 5 figures, 7 tables, 2 algorithms)

This paper contains 25 sections, 1 theorem, 26 equations, 5 figures, 7 tables, 2 algorithms.

Key Result

Theorem A.2

Under Assumption assump:image, suppose INR and VAE have the same compression rate, i.e., $d_{\text{INR}} = d_{\text{VAE}} = d$. Denote the optimal parameters of INR and VAE as $(\theta^*, \phi^*)$ and $(\psi^*, \omega^*)$, respectively. Then, In other words, under the same compression rate, the reconstruction error of INR is no greater than that of VAE.

Figures (5)

  • Figure 1: Schematic representation of a wavelet-based prior knowledge infusion into an INR compression network. The process involves the extraction of high, global, and low-frequency components using wavelet transform. These components are individually refined through self-attention mechanisms and collectively integrated via cross-attention with the INR network. The architecture further processes the information through transformer blocks and an MLP for effective compression of neural representations.
  • Figure 2: Knowledge distillation pipeline: student model mimics teacher using cross-attention for feature transfer, followed by parallel transformer processing, enhancing speed and compression while maintaining accuracy.
  • Figure 3: Distortion curves of Within-Domain Compression. The INR-based approach enables precise control over compression ratios, while other methods approximate using similar rates. 'ours/t' denotes our teacher model, and 'ours/s' represents our student model.
  • Figure 4: Visualization of CT and EM Image Compression. The figure displays the results of image compression for CT images at around 512$\times$ compression ratio and for EM images at around 12$\times$ compression ratio.
  • Figure 5: In this framework, $X$ and $T$ denote the medical images and their corresponding textual descriptions, respectively, while $G$ represents the text generator. Our approach leverages large-scale models to generate textual descriptions, subsequently integrating image pretraining with vision-language pretraining methodologies.

Theorems & Definitions (2)

  • Theorem A.2
  • proof