Table of Contents
Fetching ...

Geospecific View Generation -- Geometry-Context Aware High-resolution Ground View Inference from Satellite Views

Ningli Xu, Rongjun Qin

TL;DR

This pipeline is the first to generate close-to-real and geospecific ground views merely based on satellite images, and prompts distribution learning of diffusion models to respect image samples that are closer to the geolocation of the predicted images.

Abstract

Predicting realistic ground views from satellite imagery in urban scenes is a challenging task due to the significant view gaps between satellite and ground-view images. We propose a novel pipeline to tackle this challenge, by generating geospecifc views that maximally respect the weak geometry and texture from multi-view satellite images. Different from existing approaches that hallucinate images from cues such as partial semantics or geometry from overhead satellite images, our method directly predicts ground-view images at geolocation by using a comprehensive set of information from the satellite image, resulting in ground-level images with a resolution boost at a factor of ten or more. We leverage a novel building refinement method to reduce geometric distortions in satellite data at ground level, which ensures the creation of accurate conditions for view synthesis using diffusion networks. Moreover, we proposed a novel geospecific prior, which prompts distribution learning of diffusion models to respect image samples that are closer to the geolocation of the predicted images. We demonstrate our pipeline is the first to generate close-to-real and geospecific ground views merely based on satellite images.

Geospecific View Generation -- Geometry-Context Aware High-resolution Ground View Inference from Satellite Views

TL;DR

This pipeline is the first to generate close-to-real and geospecific ground views merely based on satellite images, and prompts distribution learning of diffusion models to respect image samples that are closer to the geolocation of the predicted images.

Abstract

Predicting realistic ground views from satellite imagery in urban scenes is a challenging task due to the significant view gaps between satellite and ground-view images. We propose a novel pipeline to tackle this challenge, by generating geospecifc views that maximally respect the weak geometry and texture from multi-view satellite images. Different from existing approaches that hallucinate images from cues such as partial semantics or geometry from overhead satellite images, our method directly predicts ground-view images at geolocation by using a comprehensive set of information from the satellite image, resulting in ground-level images with a resolution boost at a factor of ten or more. We leverage a novel building refinement method to reduce geometric distortions in satellite data at ground level, which ensures the creation of accurate conditions for view synthesis using diffusion networks. Moreover, we proposed a novel geospecific prior, which prompts distribution learning of diffusion models to respect image samples that are closer to the geolocation of the predicted images. We demonstrate our pipeline is the first to generate close-to-real and geospecific ground views merely based on satellite images.
Paper Structure (13 sections, 5 equations, 11 figures, 3 tables)

This paper contains 13 sections, 5 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Example of our synthesized geospecific views. Instead of conditioning on semantics regmi2018crossren2021cascadedlu2020geometry, ours utilizes ground-view satellite texture which provides high-frequency structural and color information. The predicted result not only shows photorealistic quality but also accurately reflects the number of stories of the garage (marked as orange rectangles).
  • Figure 2: Overview of our pipeline. Top-down View Stage and Projection Stage: the satellite textures are projected to the refined 3D geometry and then projected back to ground-view 2D space (\ref{['sec:sec31']}). Ground-view Stage: The ground view satellite texture and corresponding high-frequency layout information serve as the conditions (\ref{['sec:sec32']}). Texture-guided Generation Stage: We use the recent successful diffusion model rombach2022high conditioning on ground-view satellite textures, high-frequency information with the geospecific prior. (\ref{['sec:sec33']})
  • Figure 3: Texture-friendly geometry refinement process. The process takes the original height map as input and estimates the building footprint, followed by boundary regularization to produce the refined height map.
  • Figure 4: Examples of the satellite textures before and after our transformation-friendly geometry refinement.
  • Figure 5: Illustration of three conditions for cross-view synthesis. Semantics are widely used by existing works wu2022crosscastaldo2015semanticlu2020geometry. Our satellite textures can provide additional high-frequency and color information that details the building facade layouts, such as the window/door shape and locations.
  • ...and 6 more figures