Table of Contents
Fetching ...

Seeing through Satellite Images at Street Views

Ming Qian, Bin Tan, Qiuyu Wang, Xianwei Zheng, Hanjiang Xiong, Gui-Song Xia, Yujun Shen, Nan Xue

TL;DR

Sat2Density++ tackles SatStreet-view synthesis by learning an illumination-adaptive neural radiance field conditioned on a satellite image, using a tri-plane 3D representation and a dedicated sky-illumination pathway to render photorealistic street-view panoramas and videos. The method jointly learns geometry and appearance, with a sky branch, histogram-based illumination features, and adversarial and reconstruction losses to ensure multi-view consistency and fidelity to the satellite input. It demonstrates state-of-the-art performance on suburban CVUSA/CVACT and urban VIGOR datasets, showing improved video quality, depth-like cues, and illumination controllability, while generalizing to unseen locations (Seattle) without 3D annotations. The approach enables practical applications in navigation, urban planning, and virtual environment generation by enabling illumination-controlled, satellite-ground aligned street-view synthesis from a single satellite image and camera trajectory.

Abstract

This paper studies the task of SatStreet-view synthesis, which aims to render photorealistic street-view panorama images and videos given any satellite image and specified camera positions or trajectories. We formulate to learn neural radiance field from paired images captured from satellite and street viewpoints, which comes to be a challenging learning problem due to the sparse-view natural and the extremely-large viewpoint changes between satellite and street-view images. We tackle the challenges based on a task-specific observation that street-view specific elements, including the sky and illumination effects are only visible in street-view panoramas, and present a novel approach Sat2Density++ to accomplish the goal of photo-realistic street-view panoramas rendering by modeling these street-view specific in neural networks. In the experiments, our method is testified on both urban and suburban scene datasets, demonstrating that Sat2Density++ is capable of rendering photorealistic street-view panoramas that are consistent across multiple views and faithful to the satellite image.

Seeing through Satellite Images at Street Views

TL;DR

Sat2Density++ tackles SatStreet-view synthesis by learning an illumination-adaptive neural radiance field conditioned on a satellite image, using a tri-plane 3D representation and a dedicated sky-illumination pathway to render photorealistic street-view panoramas and videos. The method jointly learns geometry and appearance, with a sky branch, histogram-based illumination features, and adversarial and reconstruction losses to ensure multi-view consistency and fidelity to the satellite input. It demonstrates state-of-the-art performance on suburban CVUSA/CVACT and urban VIGOR datasets, showing improved video quality, depth-like cues, and illumination controllability, while generalizing to unseen locations (Seattle) without 3D annotations. The approach enables practical applications in navigation, urban planning, and virtual environment generation by enabling illumination-controlled, satellite-ground aligned street-view synthesis from a single satellite image and camera trajectory.

Abstract

This paper studies the task of SatStreet-view synthesis, which aims to render photorealistic street-view panorama images and videos given any satellite image and specified camera positions or trajectories. We formulate to learn neural radiance field from paired images captured from satellite and street viewpoints, which comes to be a challenging learning problem due to the sparse-view natural and the extremely-large viewpoint changes between satellite and street-view images. We tackle the challenges based on a task-specific observation that street-view specific elements, including the sky and illumination effects are only visible in street-view panoramas, and present a novel approach Sat2Density++ to accomplish the goal of photo-realistic street-view panoramas rendering by modeling these street-view specific in neural networks. In the experiments, our method is testified on both urban and suburban scene datasets, demonstrating that Sat2Density++ is capable of rendering photorealistic street-view panoramas that are consistent across multiple views and faithful to the satellite image.

Paper Structure

This paper contains 31 sections, 20 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Comparison of model designs and results between Sat2Density Sat2Density proposed in our conference version and Sat2Density++ (Ours) proposed in this paper. (a) Overview illustration of model designs and the video results. At the bottom of Fig. 1(a), the colored trajectory of cameras are shown on the left satellite image and the corresponding street-view images generated by models are shown on the right. (b) A comparison of the generated geometry, where surfaces are extracted from the density field using Marching Cube. The satellite image is oriented with North at the top and West to the left. For the generated 360° panorama images, North aligns with the central axis, South spans the edge boundaries, while West and East occupy the left/right quarter sections respectively.
  • Figure 2: Diagram of the proposed Sat2Density++ framework. The system begins with a satellite image input, it generates tri-plane features via the Tri-plane Net. Given specific camera poses, these features are then processed by an Illumination Adaptive Tri-plane Decoder within a Neural Renderer to render both satellite image and ground part of street view image. Additionally, a 2D sky generation module is responsible for creating the sky region in the street-view image. The final street-view images are obtained by first alpha-blending the ground and sky components, followed by super-resolution enhancement. The illumination input facilitates the rendering process by harmonizing both the Tri-plane Decoder and the 2D sky generation module. For clarity, we have omitted the steps involving the use of camera poses to generate image features from the radiance field and the super-resolution module, as well as the details of rendering from satellite viewpoints from the tri-plane.
  • Figure 3: Three video results generated by our method and Sat2DensitySat2Density on the VIGOR dataset. The full videos can be seen on the https://qianmingduowan.github.io/sat2density-pp//.
  • Figure 4: Three video results generated by our method and Sat2Density Sat2Density on the CVACT dataset. The full videos can be seen on the https://qianmingduowan.github.io/sat2density-pp//.
  • Figure 5: Comparison of Ours and Sat2Density in User Studies on Video Results. Users evaluated the Quality, Consistency, and Faithfulness of the generated videos by observing the input satellite images paired with the corresponding camera trajectory videos, as well as the videos produced by Sat2Density and Sat2Density++. They compared the methods based on these three criteria and selected the results they found to be superior. Finally, we aggregated the average preferences across multiple video sets to determine the overall user preference levels.
  • ...and 6 more figures