COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
Miguel Espinosa, Valerio Marsocci, Yuru Jia, Elliot J. Crowley, Mikolaj Czerkawski
TL;DR
COP-GEN-Beta tackles the problem of learning a unified generative prior across multiple Copernicus EO modalities. It introduces a transformer-based diffusion model that processes four modalities (DEM, S1 RTC, S2L1C, S2L2A) as a shared latent sequence with modality-specific timesteps, enabling zero-shot translation between any subset of modalities. The approach delivers both quantitative gains over a diffusion-based baseline and rich qualitative capabilities, such as atmospheric correction and elevation estimation, while supporting flexible sampling modes and easy extension to new data sources. This work lays a foundation for powerful, generalist pre-trained models in Earth observation with practical impact on sensor fusion and data augmentation across diverse applications.
Abstract
In remote sensing, multi-modal data from various sensors capturing the same scene offers rich opportunities, but learning a unified representation across these modalities remains a significant challenge. Traditional methods have often been limited to single or dual-modality approaches. In this paper, we introduce COP-GEN-Beta, a generative diffusion model trained on optical, radar, and elevation data from the Major TOM dataset. What sets COP-GEN-Beta apart is its ability to map any subset of modalities to any other, enabling zero-shot modality translation after training. This is achieved through a sequence-based diffusion transformer, where each modality is controlled by its own timestep embedding. We extensively evaluate COP-GEN-Beta on thumbnail images from the Major TOM dataset, demonstrating its effectiveness in generating high-quality samples. Qualitative and quantitative evaluations validate the model's performance, highlighting its potential as a powerful pre-trained model for future remote sensing tasks.
