Table of Contents
Fetching ...

COSMIC: Compress Satellite Images Efficiently via Diffusion Compensation

Ziyuan Zhang, Han Qiu, Maosen Zhang, Jun Liu, Bin Chen, Tianwei Zhang, Hewu Li

TL;DR

This paper designs a lightweight encoder and proposes COSMIC, a simple yet effective learned compression solution to transmit satellite images and shows that COSMIC outperforms state-of-the-art baselines on both perceptual and distortion metrics.

Abstract

With the rapidly increasing number of satellites in space and their enhanced capabilities, the amount of earth observation images collected by satellites is exceeding the transmission limits of satellite-to-ground links. Although existing learned image compression solutions achieve remarkable performance by using a sophisticated encoder to extract fruitful features as compression and using a decoder to reconstruct, it is still hard to directly deploy those complex encoders on current satellites' embedded GPUs with limited computing capability and power supply to compress images in orbit. In this paper, we propose COSMIC, a simple yet effective learned compression solution to transmit satellite images. We first design a lightweight encoder (i.e. reducing FLOPs by 2.6~5x) on satellite to achieve a high image compression ratio to save satellite-to-ground links. Then, for reconstructions on the ground, to deal with the feature extraction ability degradation due to simplifying encoders, we propose a diffusion-based model to compensate image details when decoding. Our insight is that satellite's earth observation photos are not just images but indeed multi-modal data with a nature of Text-to-Image pairing since they are collected with rich sensor data (e.g. coordinates, timestamp, etc.) that can be used as the condition for diffusion generation. Extensive experiments show that COSMIC outperforms state-of-the-art baselines on both perceptual and distortion metrics.

COSMIC: Compress Satellite Images Efficiently via Diffusion Compensation

TL;DR

This paper designs a lightweight encoder and proposes COSMIC, a simple yet effective learned compression solution to transmit satellite images and shows that COSMIC outperforms state-of-the-art baselines on both perceptual and distortion metrics.

Abstract

With the rapidly increasing number of satellites in space and their enhanced capabilities, the amount of earth observation images collected by satellites is exceeding the transmission limits of satellite-to-ground links. Although existing learned image compression solutions achieve remarkable performance by using a sophisticated encoder to extract fruitful features as compression and using a decoder to reconstruct, it is still hard to directly deploy those complex encoders on current satellites' embedded GPUs with limited computing capability and power supply to compress images in orbit. In this paper, we propose COSMIC, a simple yet effective learned compression solution to transmit satellite images. We first design a lightweight encoder (i.e. reducing FLOPs by 2.6~5x) on satellite to achieve a high image compression ratio to save satellite-to-ground links. Then, for reconstructions on the ground, to deal with the feature extraction ability degradation due to simplifying encoders, we propose a diffusion-based model to compensate image details when decoding. Our insight is that satellite's earth observation photos are not just images but indeed multi-modal data with a nature of Text-to-Image pairing since they are collected with rich sensor data (e.g. coordinates, timestamp, etc.) that can be used as the condition for diffusion generation. Extensive experiments show that COSMIC outperforms state-of-the-art baselines on both perceptual and distortion metrics.
Paper Structure (26 sections, 5 equations, 25 figures, 3 tables, 2 algorithms)

This paper contains 26 sections, 5 equations, 25 figures, 3 tables, 2 algorithms.

Figures (25)

  • Figure 1: An example of the satellite's earth observation image and this image's corresponding sensor data as a description.
  • Figure 2: COSMIC framework. (a) Compression module for satellite images: a lightweight encoder and a compensation-based decoder (Sec. \ref{['sec:C_GIC']}). (b) In the noise prediction network, each Cross-Attention (CA) block receives embedding of the Metadata Encoder (ME) (Sec. \ref{['sec:DC']}), and the Vanilla Convolution (VC) blocks use latent image discrete encoding to guide the prediction of noise for each diffusion step. (c) & (d) Convolution attention module and lightweight convolution block (Sec. \ref{['sec:LICE']}).
  • Figure 3: Trade-off between bitrates and different metrics on COSMIC and baselines. The ↑ (↓) means higher (lower) is better. The first row is for the fMoW test set (image size $256\times256$). The second is for the tile test set by comparing between the stitched images and their original ones.
  • Figure 4: Decompressed fMoW images (full images in supplementary material). $1^{st}$ row: comparison under low bitrates, COSMIC shows better visual effects. Compared with CDC, COSMIC still gets slightly better visual reconstruction with less bitrates. $2^{nd}$ row: comparison under high bitrates.
  • Figure 5: (a) Illustration of the tile test set. A high-resolution image is divided into many small sub-images (or patches), each of which is compressed individually. The reconstructed sub-images are then placed back in their original positions and stitched together to form a high-resolution reconstructed image. (b) On the tile test set, we provide two detailed views of a stitching area (outlined in orange and red). The visual comparison between COSMIC and the baseline shows that COSMIC achieves the best visual effects in terms of texture alignment and consistency in color brightness.
  • ...and 20 more figures