Table of Contents
Fetching ...

Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression

Junhui Li, Jutao Li, Xingsong Hou, Huake Wang

TL;DR

The paper addresses remote sensing image compression by introducing LDM-RSIC, which leverages a latent diffusion model to generate a distortion prior that enhances decoded image quality. It uses a two-stage approach: Stage I learns a compact prior F from the original and decoded images via a self-encoder and a latent representation module feeding a Transformer-based multi-scale enhancement network with dynamic feature attention; Stage II uses a latent diffusion process conditioned on the decoded image to produce a refined prior F^ that further improves reconstruction. The key contributions are the latent-space prior generation framework, the DFAM-equipped MEN for effective fusion of prior information, and demonstrated performance gains over state-of-the-art traditional and learned codecs, including notable bit savings when applying the scheme to JPEG2000 on DOTA. The approach advances RS image compression by introducing a diffusion-driven distortion prior that improves rate-distortion and perceptual quality, with practical implications for efficient RS data storage and transmission.

Abstract

Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method, which aims to enhance the final decoding quality of RS images by utilizing the generated distortion prior from a LDM. Our approach consists of two stages. In the first stage, a self-encoder learns prior from the high-quality input image. In the second stage, the prior is generated through an LDM, conditioned on the decoded image of an existing learning-based image compression algorithm, to be used as auxiliary information for generating the texture-rich enhanced image. To better utilize the prior, a channel attention and gate-based dynamic feature attention module (DFAM) is embedded into a Transformer-based multi-scale enhancement network (MEN) for image enhancement. Extensive experiments demonstrate the proposed LDM-RSIC significantly outperforms existing state-of-the-art traditional and learning-based image compression algorithms in terms of both subjective perception and objective metrics. Additionally, we use the LDM-based scheme to improve the traditional image compression algorithm JPEG2000 and obtain 32.00% bit savings on the DOTA testing set. The code will be available at https://github.com/mlkk518/LDM-RSIC.

Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression

TL;DR

The paper addresses remote sensing image compression by introducing LDM-RSIC, which leverages a latent diffusion model to generate a distortion prior that enhances decoded image quality. It uses a two-stage approach: Stage I learns a compact prior F from the original and decoded images via a self-encoder and a latent representation module feeding a Transformer-based multi-scale enhancement network with dynamic feature attention; Stage II uses a latent diffusion process conditioned on the decoded image to produce a refined prior F^ that further improves reconstruction. The key contributions are the latent-space prior generation framework, the DFAM-equipped MEN for effective fusion of prior information, and demonstrated performance gains over state-of-the-art traditional and learned codecs, including notable bit savings when applying the scheme to JPEG2000 on DOTA. The approach advances RS image compression by introducing a diffusion-driven distortion prior that improves rate-distortion and perceptual quality, with practical implications for efficient RS data storage and transmission.

Abstract

Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method, which aims to enhance the final decoding quality of RS images by utilizing the generated distortion prior from a LDM. Our approach consists of two stages. In the first stage, a self-encoder learns prior from the high-quality input image. In the second stage, the prior is generated through an LDM, conditioned on the decoded image of an existing learning-based image compression algorithm, to be used as auxiliary information for generating the texture-rich enhanced image. To better utilize the prior, a channel attention and gate-based dynamic feature attention module (DFAM) is embedded into a Transformer-based multi-scale enhancement network (MEN) for image enhancement. Extensive experiments demonstrate the proposed LDM-RSIC significantly outperforms existing state-of-the-art traditional and learning-based image compression algorithms in terms of both subjective perception and objective metrics. Additionally, we use the LDM-based scheme to improve the traditional image compression algorithm JPEG2000 and obtain 32.00% bit savings on the DOTA testing set. The code will be available at https://github.com/mlkk518/LDM-RSIC.
Paper Structure (29 sections, 14 equations, 10 figures, 2 tables, 2 algorithms)

This paper contains 29 sections, 14 equations, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: Overview of the proposed LDM-RSIC, which comprises the compressor, MEN, and LDM. The compressor utilizes the competitive image compression algorithm ELIC he2022elic. "2TB" indicates two stacked Transformer blocks, and "5RB" denotes five serially connected residual blocks. "AE" and "AD" refer to the arithmetic encoder and decoder, respectively. Stage I aims to learn the prior information $\mathbf{F}$, and Stage II focuses on employing LDM to generate the prior features $\mathbf{\hat{F}}$ to replace $\mathbf{F}$.
  • Figure 2: Performance evaluation on the testing sets of DOTA and UC-M in terms of PSNR, MS-SSIM, and LPIPS. LDM-RSIC and JPEG2000* refer to algorithms designed using ELIC and JPEG2000 as the compressor in Fig. \ref{['Fig:main_framework']}, respectively.
  • Figure 3: Compressed images by several compression algorithms on the testing images "P0253" and "P0006" of the DOTA testing set.
  • Figure 4: Compressed images by several compression algorithms on the images "tenniscourt91" and "intersection97" of the UC-M testing set.
  • Figure 5: Visualization of the decoded images "tenniscourt84” and "intersection94” using state-of-the-art compression methods and the proposed LDM-RSIC.
  • ...and 5 more figures