Table of Contents
Fetching ...

MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation

Zhiping Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, Zhengxia Zou

TL;DR

This work proposes a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions, and designs a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise.

Abstract

The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level, exploring the creation of worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. To achieve unbounded and arbitrary-sized image generation, we design a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise. To train MetaEarth, we construct a large dataset comprising multi-resolution optical remote sensing images with geographical information. Experiments have demonstrated the powerful capabilities of our method in generating global-scale images. Additionally, the MetaEarth serves as a data engine that can provide high-quality and rich training data for downstream tasks. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.

MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation

TL;DR

This work proposes a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions, and designs a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise.

Abstract

The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level, exploring the creation of worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. To achieve unbounded and arbitrary-sized image generation, we design a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise. To train MetaEarth, we construct a large dataset comprising multi-resolution optical remote sensing images with geographical information. Experiments have demonstrated the powerful capabilities of our method in generating global-scale images. Additionally, the MetaEarth serves as a data engine that can provide high-quality and rich training data for downstream tasks. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.
Paper Structure (32 sections, 28 equations, 16 figures, 6 tables)

This paper contains 32 sections, 28 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: We propose MetaEarth, a generative foundation model that simulates Earth’s visuals from an overhead perspective. MetaEarth shows powerful capabilities on generating worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. For more visualization and animated results, please refer to our project page: https://jiupinjia.github.io/metaearth/.
  • Figure 2: The overall structure of MetaEarth. We propose a resolution-guided, self-cascading framework which is capable of generating scenes and resolutions for any global region. The generation process unfolds in multiple stages, starting with low-resolution images and advancing to high-resolution images. In each stage, the generation is conditioned on the low-resolution images generated in the preceding stage and their spatial resolution.
  • Figure 3: To generate unbounded images, firstly, the generated image from the previous stage is cropped into overlapping image blocks as conditions. Then, with the proposed noise sampling strategy, the shared regions between adjacent image blocks generate similar content. Lastly, the generated images are tiled and re-organized.
  • Figure 4: Necessary conditions on initial noise sampling for generating continuous unbounded scenes.
  • Figure 5: Images of various land features across the globe generated by MetaEarth, including water bodies, mountains, deserts, farmland, cities, and countryside areas.
  • ...and 11 more figures