Table of Contents
Fetching ...

Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon

TL;DR

This work proposes AltO, an unsupervised learning framework for estimating homography in multimodal image pairs that not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators.

Abstract

Estimating the homography between two images is crucial for mid- or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs. To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap. To handle these gaps, we use Barlow Twins loss for the modality gap and propose an extended version, Geometry Barlow Twins, for the geometry gap. As a result, we demonstrate that our method, AltO, can be trained on multimodal datasets without any ground-truth data. It not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators. The source code can be found at:~\url{https://github.com/songsang7/AltO}

Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization

TL;DR

This work proposes AltO, an unsupervised learning framework for estimating homography in multimodal image pairs that not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators.

Abstract

Estimating the homography between two images is crucial for mid- or high-level vision tasks, such as image stitching and fusion. However, using supervised learning methods is often challenging or costly due to the difficulty of collecting ground-truth data. In response, unsupervised learning approaches have emerged. Most early methods, though, assume that the given image pairs are from the same camera or have minor lighting differences. Consequently, while these methods perform effectively under such conditions, they generally fail when input image pairs come from different domains, referred to as multimodal image pairs. To address these limitations, we propose AltO, an unsupervised learning framework for estimating homography in multimodal image pairs. Our method employs a two-phase alternating optimization framework, similar to Expectation-Maximization (EM), where one phase reduces the geometry gap and the other addresses the modality gap. To handle these gaps, we use Barlow Twins loss for the modality gap and propose an extended version, Geometry Barlow Twins, for the geometry gap. As a result, we demonstrate that our method, AltO, can be trained on multimodal datasets without any ground-truth data. It not only outperforms other unsupervised methods but is also compatible with various architectures of homography estimators. The source code can be found at:~\url{https://github.com/songsang7/AltO}

Paper Structure

This paper contains 29 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Examples of types of gaps. This paper will address both geometry and modality gaps simultaneously. Image pairs are introduced by DLKFM dlkfm.
  • Figure 2: Conceptual Diagram of the Barlow Twins Method barlowtwins
  • Figure 3: Overview of architecture. Upper diagram shows static view and lower diagrams illustrate phase switching between Geometry Learning (GL) phase and Modality-Agnostic Representation Learning (MARL) phase.
  • Figure 4: Examples of image pair for each datasets. Google Map and Google Earth are introduced by DLKFM dlkfm. Deep NIR is proposed in deepnir
  • Figure 5: Visualization of homography estimation using center box. The first row shows the state before applying homography (green rectangles). Subsequent rows compare the results after applying ground-truth (green) and predicted (red) homography matrices. Our method, AltO, closely matches supervised learning-based methods, while other unsupervised approaches underperform.
  • ...and 2 more figures