Table of Contents
Fetching ...

Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates

Yixuan Ye, Ce Wang, Wanjie Sun, Zhenzhong Chen

TL;DR

The paper addresses remote-sensing image compression at extremely low bitrates where standard codecs fail to preserve semantic structure. It introduces Map-Assisted Generative Compression (MAGC), a two-stage framework built around a pre-trained diffusion model with strong priors, using latent representation $z_0$ and compressed latent $\ ilde{z}$ together with vector maps $m$ processed by a semantic adapter. In stage one, a VAE-based latent compressor with hyperpriors and a SPADE conditioned transform provides implicit guidance; in stage two, a conditional diffusion model uses both implicit guidance from $\ ilde{z}$ and explicit guidance from $m$ to reconstruct semantically accurate images via a pre-trained SD decoder. Experiments on remote-sensing data show MAGC achieves superior perceptual quality (LPIPS, DISTS, FID, MUSIQ) and higher semantic segmentation performance (mIoU) at ultra-low bitrates, outperforming standard codecs and prior learning-based methods, with publicly available dataset and code.

Abstract

Remote-sensing (RS) image compression at extremely low bitrates has always been a challenging task in practical scenarios like edge device storage and narrow bandwidth transmission. Generative models including VAEs and GANs have been explored to compress RS images into extremely low-bitrate streams. However, these generative models struggle to reconstruct visually plausible images due to the highly ill-posed nature of extremely low-bitrate image compression. To this end, we propose an image compression framework that utilizes a pre-trained diffusion model with powerful natural image priors to achieve high-realism reconstructions. However, diffusion models tend to hallucinate small structures and textures due to the significant information loss at limited bitrates. Thus, we introduce vector maps as semantic and structural guidance and propose a novel image compression approach named Map-Assisted Generative Compression (MAGC). MAGC employs a two-stage pipeline to compress and decompress RS images at extremely low bitrates. The first stage maps an image into a latent representation, which is then further compressed in a VAE architecture to save bitrates and serves as implicit guidance in the subsequent diffusion process. The second stage conducts a conditional diffusion model to generate a visually pleasing and semantically accurate result using implicit guidance and explicit semantic guidance. Quantitative and qualitative comparisons show that our method outperforms standard codecs and other learning-based methods in terms of perceptual quality and semantic accuracy. The dataset and code will be publicly available at https://github.com/WHUyyx/MAGC.

Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates

TL;DR

The paper addresses remote-sensing image compression at extremely low bitrates where standard codecs fail to preserve semantic structure. It introduces Map-Assisted Generative Compression (MAGC), a two-stage framework built around a pre-trained diffusion model with strong priors, using latent representation and compressed latent together with vector maps processed by a semantic adapter. In stage one, a VAE-based latent compressor with hyperpriors and a SPADE conditioned transform provides implicit guidance; in stage two, a conditional diffusion model uses both implicit guidance from and explicit guidance from to reconstruct semantically accurate images via a pre-trained SD decoder. Experiments on remote-sensing data show MAGC achieves superior perceptual quality (LPIPS, DISTS, FID, MUSIQ) and higher semantic segmentation performance (mIoU) at ultra-low bitrates, outperforming standard codecs and prior learning-based methods, with publicly available dataset and code.

Abstract

Remote-sensing (RS) image compression at extremely low bitrates has always been a challenging task in practical scenarios like edge device storage and narrow bandwidth transmission. Generative models including VAEs and GANs have been explored to compress RS images into extremely low-bitrate streams. However, these generative models struggle to reconstruct visually plausible images due to the highly ill-posed nature of extremely low-bitrate image compression. To this end, we propose an image compression framework that utilizes a pre-trained diffusion model with powerful natural image priors to achieve high-realism reconstructions. However, diffusion models tend to hallucinate small structures and textures due to the significant information loss at limited bitrates. Thus, we introduce vector maps as semantic and structural guidance and propose a novel image compression approach named Map-Assisted Generative Compression (MAGC). MAGC employs a two-stage pipeline to compress and decompress RS images at extremely low bitrates. The first stage maps an image into a latent representation, which is then further compressed in a VAE architecture to save bitrates and serves as implicit guidance in the subsequent diffusion process. The second stage conducts a conditional diffusion model to generate a visually pleasing and semantically accurate result using implicit guidance and explicit semantic guidance. Quantitative and qualitative comparisons show that our method outperforms standard codecs and other learning-based methods in terms of perceptual quality and semantic accuracy. The dataset and code will be publicly available at https://github.com/WHUyyx/MAGC.
Paper Structure (29 sections, 11 equations, 13 figures, 3 tables)

This paper contains 29 sections, 11 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Operational diagrams of learned image comprssion frameworks. Q denotes quantization. EM represents the entropy model. D denotes the discriminator. $\mathcal{E}$ and $\mathcal{D}$ denote the pre-trained SD VAE encoder and decoder used to transform data between pixel space and latent space.
  • Figure 2: The two-stage pipeline of the proposed MAGC. In the first stage, the latent compression module (LCM) is designed to compress the latent representation and provide implicit guidance for the conditional diffusion model. In the second stage, the semantic adapter module (SAM) is utilized to produce multi-scale features, serving as explicit guidance for the conditional diffusion process.
  • Figure 3: The network architecture of the proposed LCM, consists of the latent transform networks, hyperprior networks and a channel-wise context model. Q represents quantization. AE, AD represent arithmetic encoder and arithmetic decoder. $\downarrow$ and $\uparrow$ indicate downsampling and upsampling. We set N = 128 and M = 64 in our experiments.
  • Figure 4: Structures of ResBlock, SPADE ResBlock, SPADE block and basic block in our work.
  • Figure 5: Network architecture of semantic encoder and semantic adapter module (SAM).
  • ...and 8 more figures