Table of Contents
Fetching ...

SyntStereo2Real: Edge-Aware GAN for Remote Sensing Image-to-Image Translation while Maintaining Stereo Constraint

Vasudha Venkatesan, Daniel Panangian, Mario Fuentes Reyes, Ksenia Bittner

TL;DR

SyntStereo2Real tackles domain generalization in synthetic-to-real stereo translation for remote sensing by integrating edge-aware image translation with stereo geometry constraints. The approach uses Sobel edge maps as additional input to a lightweight autoencoder, producing semantically consistent translations while enforcing epipolar consistency via a warping loss in a single network framework. Quantitative gains over StereoGAN are demonstrated across remote-sensing and autonomous driving datasets, with improvements in disparity accuracy (MAD, 3px, 1px) and substantially fewer parameters. The method enables efficient, geometry-preserving synthetic-to-real translation that generalizes across domains and supports improved stereo-based tasks like disparity estimation in challenging remote-sensing contexts.

Abstract

In the field of remote sensing, the scarcity of stereo-matched and particularly lack of accurate ground truth data often hinders the training of deep neural networks. The use of synthetically generated images as an alternative, alleviates this problem but suffers from the problem of domain generalization. Unifying the capabilities of image-to-image translation and stereo-matching presents an effective solution to address the issue of domain generalization. Current methods involve combining two networks, an unpaired image-to-image translation network and a stereo-matching network, while jointly optimizing them. We propose an edge-aware GAN-based network that effectively tackles both tasks simultaneously. We obtain edge maps of input images from the Sobel operator and use it as an additional input to the encoder in the generator to enforce geometric consistency during translation. We additionally include a warping loss calculated from the translated images to maintain the stereo consistency. We demonstrate that our model produces qualitatively and quantitatively superior results than existing models, and its applicability extends to diverse domains, including autonomous driving.

SyntStereo2Real: Edge-Aware GAN for Remote Sensing Image-to-Image Translation while Maintaining Stereo Constraint

TL;DR

SyntStereo2Real tackles domain generalization in synthetic-to-real stereo translation for remote sensing by integrating edge-aware image translation with stereo geometry constraints. The approach uses Sobel edge maps as additional input to a lightweight autoencoder, producing semantically consistent translations while enforcing epipolar consistency via a warping loss in a single network framework. Quantitative gains over StereoGAN are demonstrated across remote-sensing and autonomous driving datasets, with improvements in disparity accuracy (MAD, 3px, 1px) and substantially fewer parameters. The method enables efficient, geometry-preserving synthetic-to-real translation that generalizes across domains and supports improved stereo-based tasks like disparity estimation in challenging remote-sensing contexts.

Abstract

In the field of remote sensing, the scarcity of stereo-matched and particularly lack of accurate ground truth data often hinders the training of deep neural networks. The use of synthetically generated images as an alternative, alleviates this problem but suffers from the problem of domain generalization. Unifying the capabilities of image-to-image translation and stereo-matching presents an effective solution to address the issue of domain generalization. Current methods involve combining two networks, an unpaired image-to-image translation network and a stereo-matching network, while jointly optimizing them. We propose an edge-aware GAN-based network that effectively tackles both tasks simultaneously. We obtain edge maps of input images from the Sobel operator and use it as an additional input to the encoder in the generator to enforce geometric consistency during translation. We additionally include a warping loss calculated from the translated images to maintain the stereo consistency. We demonstrate that our model produces qualitatively and quantitatively superior results than existing models, and its applicability extends to diverse domains, including autonomous driving.
Paper Structure (15 sections, 5 equations, 8 figures, 3 tables)

This paper contains 15 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Examples of aerial scene translated by SyntStereo2Real. Our model can produce semantic-consistent realistic translations.
  • Figure 2: Aerial images translated using CUT CUT. The model tends to hallucinate when translating images with diverse scenes, where the target distribution is more likely to be unbalanced.
  • Figure 3: Illustration of the generator architecture in an autoencoder with edge map integration. The image along with its corresponding edge map is encoded and added together as content edge code before applying it as an input to the decoder. The decoder merges the content-edge code with style code from every domain to generate content that is contextually fitting. $xc_a$, $xc_b$ represents the input images from both domains (content), $xe_a$, $xe_b$ represents the corresponding edge maps. $c_a$, $c_b$, $e_a$, $e_b$ represents the content and edge code from encoder for both domains. $s_a$, $s_b$ are the randomly initialized style code before the training. $x_{aa}$, $x_{ab}$, $x_{ba}$, $x_{bb}$ represents the respective output images from the decoder.
  • Figure 4: Pairs of translated images. For the translated left-view images \ref{['fig:left1']} and \ref{['fig:left2']}, the corresponding right-view images \ref{['fig:right1']} and \ref{['fig:right2']} are also displayed. As can be seen from the images, a semantic-consistent translation is applied to both the left and right-view images.
  • Figure 5: Illustration of the GAN-based model architecture featuring multiple loss functions. The design incorporates a combination of adversarial, reconstruction, cycle and warping losses. Adversarial loss promotes realistic image generation, while reconstruction loss ensures faithful reproduction of input data, cycle loss enforces the correct mapping between domains and warping loss enforces geometrical stereo constraints.
  • ...and 3 more figures