Table of Contents
Fetching ...

Learned Image Compression with Dictionary-based Entropy Model

Jingbo Lu, Leheng Zhang, Xingyu Zhou, Mu Li, Wen Li, Shuhang Gu

TL;DR

A novel entropy model named Dictionary-Based Cross Attention Entropy model is proposed, which introduces a learnable dictionary to summarize the typical structures occurring in the training dataset to enhance the entropy model.

Abstract

Learned image compression methods have attracted great research interest and exhibited superior rate-distortion performance to the best classical image compression standards of the present. The entropy model plays a key role in learned image compression, which estimates the probability distribution of the latent representation for further entropy coding. Most existing methods employed hyper-prior and auto-regressive architectures to form their entropy models. However, they only aimed to explore the internal dependencies of latent representation while neglecting the importance of extracting prior from training data. In this work, we propose a novel entropy model named Dictionary-based Cross Attention Entropy model, which introduces a learnable dictionary to summarize the typical structures occurring in the training dataset to enhance the entropy model. Extensive experimental results have demonstrated that the proposed model strikes a better balance between performance and latency, achieving state-of-the-art results on various benchmark datasets.

Learned Image Compression with Dictionary-based Entropy Model

TL;DR

A novel entropy model named Dictionary-Based Cross Attention Entropy model is proposed, which introduces a learnable dictionary to summarize the typical structures occurring in the training dataset to enhance the entropy model.

Abstract

Learned image compression methods have attracted great research interest and exhibited superior rate-distortion performance to the best classical image compression standards of the present. The entropy model plays a key role in learned image compression, which estimates the probability distribution of the latent representation for further entropy coding. Most existing methods employed hyper-prior and auto-regressive architectures to form their entropy models. However, they only aimed to explore the internal dependencies of latent representation while neglecting the importance of extracting prior from training data. In this work, we propose a novel entropy model named Dictionary-based Cross Attention Entropy model, which introduces a learnable dictionary to summarize the typical structures occurring in the training dataset to enhance the entropy model. Extensive experimental results have demonstrated that the proposed model strikes a better balance between performance and latency, achieving state-of-the-art results on various benchmark datasets.

Paper Structure

This paper contains 31 sections, 8 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Rate-speed comparison on Kodak. Left-top is better.
  • Figure 2: The overall framework of the proposed network. Given an input image $\bm{x}$, the encoder $g_a$ transforms it into the latent representation $\bm{y}$, then the proposed dictionary-based cross-attention entropy model is used to encode or decode the quantized $\hat{\bm{y}}$. Finally, the decoder $g_s$ reconstructs the image $\hat{\bm{x}}$ from the latent representation $\overline{\bm{y}}$. In the dictionary-based cross-attention entropy model, we introduce a learnable dictionary to capture typical structures and textures in natural images for improving the distribution estimation of the latent representation $\bm{y}$.
  • Figure 3: The proposed Dictionary-based Cross Attention Entropy model. The Dictionary-based Slice Network $e_i$ is used to encode or decode the latent representation $\hat{\bm{y}}_i$. In $e_i$, the hyper-prior feature $\bm{\mathcal{F}}_{z}$ and channel-wise auto-regressive feature $\overline{\bm{y}}_{<i}$ are first fed into our Multi-Scale Features Aggregation module to obtain multi-scale features $\bm{X}_{ms_i}$. Then, the multi-scale features $\bm{X}_{ms_i}$ are used to query the dictionary to extract the dictionary feature $\bm{\mathcal{F}}_{\textit{dict}_i}$. Finally, the dictionary feature $\bm{\mathcal{F}}_{\textit{dict}_i}$ is taken as input to the entropy module $f_E$ to estimate the distribution parameters $\Phi_i$ of $\hat{\bm{y}}_i$ for entropy coding, and to the latent residual prediction net $f_{LRP}$ to predict the quantization error $\bm{r}_i$.
  • Figure 4: Performance evaluation (PSNR) on the Kodak dataset.
  • Figure 5: Performance evaluation (PSNR) on the CLIC dataset.
  • ...and 11 more figures