Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing

Piper Wolters; Favyen Bastani; Aniruddha Kembhavi

Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing

Piper Wolters, Favyen Bastani, Aniruddha Kembhavi

TL;DR

The paper tackles the lack of trustworthy metrics, scalable data, and evidence for machine-use of remote-sensing super-resolution. It introduces CLIPScore, a CLIP-based perceptual metric, and builds S2-NAIP, a large public-domain SR dataset, to enable robust, cross-dataset evaluation. Through a comprehensive study across CNN/L2, GANs, and diffusion methods, it finds GANs (notably ESRGAN) outperform others on CLIPScore and LPIPS, and demonstrates that using CLIPScore as a training loss dramatically speeds up training (up to 18x) with improved outputs. It also assesses downstream task usefulness and demonstrates global deployment of the SR pipeline via Satlas, providing freely accessible, high-resolution imagery for planet monitoring and setting a foundation for future machine-usable SR research.

Abstract

Super-Resolution for remote sensing has the potential for huge impact on planet monitoring by producing accurate and realistic high resolution imagery on a frequent basis and a global scale. Despite a lot of attention, several inconsistencies and challenges have prevented it from being deployed in practice. These include the lack of effective metrics, fragmented and relatively small-scale datasets for training, insufficient comparisons across a suite of methods, and unclear evidence for the use of super-resolution outputs for machine consumption. This work presents a new metric for super-resolution, CLIPScore, that corresponds far better with human judgments than previous metrics on an extensive study. We use CLIPScore to evaluate four standard methods on a new large-scale dataset, S2-NAIP, and three existing benchmark datasets, and find that generative adversarial networks easily outperform more traditional L2 loss-based models and are more semantically accurate than modern diffusion models. We also find that using CLIPScore as an auxiliary loss can speed up the training of GANs by 18x and lead to improved outputs, resulting in an effective model in diverse geographies across the world which we will release publicly. The dataset, pre-trained model weights, and code are available at https://github.com/allenai/satlas-super-resolution/.

Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing

TL;DR

Abstract

Paper Structure (16 sections, 8 figures, 4 tables)

This paper contains 16 sections, 8 figures, 4 tables.

Introduction
Related Work
Metrics
Super-Resolution Human Judgement Dataset
CLIP as an Image Similarity Metric
Data
S2-NAIP Dataset
Method Study
Improving ESRGAN
Training with CLIPScore
Super-Resolution for Downstream Tasks
Image Usage
Feature Usage
Deploying Super-Resolution Globally
Conclusion
...and 1 more sections

Figures (8)

Figure 1: Display of the incredible power of super-resolution for remote sensing imagery. High-resolution satellite imagery is not available for free worldwide, and a public source such as NAIP is restricted to every 2-3 years in the US. On the other hand, Sentinel-2 imagery is global, free, and has a revisit rate of 5-10 days, so with super-resolution methods, we can generate high-resolution imagery globally and frequently, especially in places that have disproportionately less public imagery.
Figure 2: Example of a target image (GT), an ESRGAN output at full resolution as well as downsampled 16x, and a HighResNet output, with corresponding metrics. Note that the four images are ordered from best to worst based on human preference, and PSNR and SSIM increase in an opposite trend. Our proposed CLIPScore more closely matches human judgement.
Figure 3: The level of accuracy between human preferences and those generated by the various metrics. The x-axis is ordered from worst to best average accuracy between the two datasets. The y-axis is adjusted to a range of 40% to 90% to better show the difference in accuracy across metrics.
Figure 4: Examples of target images (GT), one of the corresponding low-resolution images (Low-Res), and Super-Res outputs from SRCNN, HighResNet, Medium ESRGAN, and SR3 on samples from the S2-NAIP dataset. We recommend zooming in for the full effect.
Figure 5: CLIPScore results on the S2-NAIP dataset, with three sizes of the ESRGAN model and five data splits. Performance increases with more data and larger models.
...and 3 more figures

Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing

TL;DR

Abstract

Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing

Authors

TL;DR

Abstract

Table of Contents

Figures (8)