Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression
Junhui Li, Jutao Li, Xingsong Hou, Huake Wang
TL;DR
The paper addresses remote sensing image compression by introducing LDM-RSIC, which leverages a latent diffusion model to generate a distortion prior that enhances decoded image quality. It uses a two-stage approach: Stage I learns a compact prior F from the original and decoded images via a self-encoder and a latent representation module feeding a Transformer-based multi-scale enhancement network with dynamic feature attention; Stage II uses a latent diffusion process conditioned on the decoded image to produce a refined prior F^ that further improves reconstruction. The key contributions are the latent-space prior generation framework, the DFAM-equipped MEN for effective fusion of prior information, and demonstrated performance gains over state-of-the-art traditional and learned codecs, including notable bit savings when applying the scheme to JPEG2000 on DOTA. The approach advances RS image compression by introducing a diffusion-driven distortion prior that improves rate-distortion and perceptual quality, with practical implications for efficient RS data storage and transmission.
Abstract
Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method, which aims to enhance the final decoding quality of RS images by utilizing the generated distortion prior from a LDM. Our approach consists of two stages. In the first stage, a self-encoder learns prior from the high-quality input image. In the second stage, the prior is generated through an LDM, conditioned on the decoded image of an existing learning-based image compression algorithm, to be used as auxiliary information for generating the texture-rich enhanced image. To better utilize the prior, a channel attention and gate-based dynamic feature attention module (DFAM) is embedded into a Transformer-based multi-scale enhancement network (MEN) for image enhancement. Extensive experiments demonstrate the proposed LDM-RSIC significantly outperforms existing state-of-the-art traditional and learning-based image compression algorithms in terms of both subjective perception and objective metrics. Additionally, we use the LDM-based scheme to improve the traditional image compression algorithm JPEG2000 and obtain 32.00% bit savings on the DOTA testing set. The code will be available at https://github.com/mlkk518/LDM-RSIC.
