Table of Contents
Fetching ...

Learning Lossless Compression for High Bit-Depth Volumetric Medical Image

Kai Wang, Yuanchao Bai, Daxin Li, Deming Zhai, Junjun Jiang, Xianming Liu

TL;DR

The Bit-Division based Lossless Volumetric Image Compression (BD-LVIC) framework is presented, which is tailored for high bit-depth medical volume compression and sets new performance benchmarks across various datasets but also maintains a competitive coding speed.

Abstract

Recent advances in learning-based methods have markedly enhanced the capabilities of image compression. However, these methods struggle with high bit-depth volumetric medical images, facing issues such as degraded performance, increased memory demand, and reduced processing speed. To address these challenges, this paper presents the Bit-Division based Lossless Volumetric Image Compression (BD-LVIC) framework, which is tailored for high bit-depth medical volume compression. The BD-LVIC framework skillfully divides the high bit-depth volume into two lower bit-depth segments: the Most Significant Bit-Volume (MSBV) and the Least Significant Bit-Volume (LSBV). The MSBV concentrates on the most significant bits of the volumetric medical image, capturing vital structural details in a compact manner. This reduction in complexity greatly improves compression efficiency using traditional codecs. Conversely, the LSBV deals with the least significant bits, which encapsulate intricate texture details. To compress this detailed information effectively, we introduce an effective learning-based compression model equipped with a Transformer-Based Feature Alignment Module, which exploits both intra-slice and inter-slice redundancies to accurately align features. Subsequently, a Parallel Autoregressive Coding Module merges these features to precisely estimate the probability distribution of the least significant bit-planes. Our extensive testing demonstrates that the BD-LVIC framework not only sets new performance benchmarks across various datasets but also maintains a competitive coding speed, highlighting its significant potential and practical utility in the realm of volumetric medical image compression.

Learning Lossless Compression for High Bit-Depth Volumetric Medical Image

TL;DR

The Bit-Division based Lossless Volumetric Image Compression (BD-LVIC) framework is presented, which is tailored for high bit-depth medical volume compression and sets new performance benchmarks across various datasets but also maintains a competitive coding speed.

Abstract

Recent advances in learning-based methods have markedly enhanced the capabilities of image compression. However, these methods struggle with high bit-depth volumetric medical images, facing issues such as degraded performance, increased memory demand, and reduced processing speed. To address these challenges, this paper presents the Bit-Division based Lossless Volumetric Image Compression (BD-LVIC) framework, which is tailored for high bit-depth medical volume compression. The BD-LVIC framework skillfully divides the high bit-depth volume into two lower bit-depth segments: the Most Significant Bit-Volume (MSBV) and the Least Significant Bit-Volume (LSBV). The MSBV concentrates on the most significant bits of the volumetric medical image, capturing vital structural details in a compact manner. This reduction in complexity greatly improves compression efficiency using traditional codecs. Conversely, the LSBV deals with the least significant bits, which encapsulate intricate texture details. To compress this detailed information effectively, we introduce an effective learning-based compression model equipped with a Transformer-Based Feature Alignment Module, which exploits both intra-slice and inter-slice redundancies to accurately align features. Subsequently, a Parallel Autoregressive Coding Module merges these features to precisely estimate the probability distribution of the least significant bit-planes. Our extensive testing demonstrates that the BD-LVIC framework not only sets new performance benchmarks across various datasets but also maintains a competitive coding speed, highlighting its significant potential and practical utility in the realm of volumetric medical image compression.

Paper Structure

This paper contains 35 sections, 11 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Comparative analysis of the bitrate saving ratios on the Covid-CT morozov2020mosmeddatacovid and Trabit-MRI mader2019trabit2019 datasets, using the JPEG-LS weinberger2000loco codec as the baseline for calculation. Among the compared codecs, JPEG-XL alakuijala2019jpeg stands as the most advanced traditional image codec, in contrast to aiWave xue2022aiwave and BCM-Net liu2024bilateral, which represent the cutting-edge in learned lossless compression methods for volumetric medical images.
  • Figure 2: (a) Assessing the impact of bit depth on PMF construction time, (b) Exploring bit depth's effect on memory usage, (c) Investigating the impact of patch size and bit depth on PMF construction time.
  • Figure 3: (a-c) The high bit-depth medical slice of abdominal CT, denoted by $X_t$, alongside its corresponding MSBS $X_t^M$ and LSBS $X_t^L$. Here, $t$ denotes the index of the current slice. (d-f) Adjacent slices $X_{t+1}$, $X_{t+2}$, and their difference $X_{t+2}-X_{t+1}$.
  • Figure 4: The overview of our BD-LVIC framework. Each slice of medical volume is decomposed into MSBS and LSBS. We first utilize traditional codecs to compress all MSBSs. Then, each LSBS is encoded slice-by-slice. During encoding the current LSBS $X_t^L$, we employ TFAM to generate the aligned feature $C_t^a$ and utilize PACM to extract local context $C_t^l$ and fuse it with $C_t^a$ to estimate the distribution of $X_t^L$. TFAM includes Feature Extraction (FE), Embedding Layer (EL), Self-attention Block (SAB), Conditional Position Embedding (CPE), Cross-attention Block (CAB), and PACM contains Masked Convolution (MConv), Parameter Predictor Network (PPN), Arithmetic Encoder (AE), Decoder (AD).
  • Figure 5: The details of our proposed TFAM, which integrates key elements: Feature Extraction (FE), Embedding Layers (EL), Cross-Attention (CAB), and Self-Attention Blocks (SAB). Notably, $\rm Conv k3s1$ represents a convolution layer with a $3 \times 3$ kernel and stride 1. $\rm 1\times1$ denotes $\rm Conv k1s1$, and $\rm DWConv$ signifies a depth-wise convolution layer. CPE stands for Conditional Position Embedding.
  • ...and 5 more figures