Table of Contents
Fetching ...

Robust Image Stitching with Optimal Plane

Lang Nie, Yuan Mei, Kang Liao, Yunqiu Xu, Chunyu Lin, Bin Xiao

TL;DR

Robust Image Stitching with Optimal Plane tackles the challenge of cross-scene generalization and content naturalness in image stitching. It introduces RopStitch, an unsupervised framework that combines a dual-branch backbone to inject universal content priors with a virtual optimal stitching plane, mitigating distortions. The method uses correlation-wise feature fusion and an iterative coefficient predictor guided by a minimal semantic distortion constraint to enable bidirectional warps on the optimal plane, achieving strong robustness and visual naturalness. Extensive experiments on the UDIS-D dataset and classical benchmarks show superior cross-domain performance and competitive accuracy, including notable zero-shot generalization, with efficient runtime. The approach advances practical image stitching for diverse real-world scenes and applications in VR, autonomous driving, and surveillance.

Abstract

We present \textit{RopStitch}, an unsupervised deep image stitching framework with both robustness and naturalness. To ensure the robustness of \textit{RopStitch}, we propose to incorporate the universal prior of content perception into the image stitching model by a dual-branch architecture. It separately captures coarse and fine features and integrates them to achieve highly generalizable performance across diverse unseen real-world scenes. Concretely, the dual-branch model consists of a pretrained branch to capture semantically invariant representations and a learnable branch to extract fine-grained discriminative features, which are then merged into a whole by a controllable factor at the correlation level. Besides, considering that content alignment and structural preservation are often contradictory to each other, we propose a concept of virtual optimal planes to relieve this conflict. To this end, we model this problem as a process of estimating homography decomposition coefficients, and design an iterative coefficient predictor and minimal semantic distortion constraint to identify the optimal plane. This scheme is finally incorporated into \textit{RopStitch} by warping both views onto the optimal plane bidirectionally. Extensive experiments across various datasets demonstrate that \textit{RopStitch} significantly outperforms existing methods, particularly in scene robustness and content naturalness. The code is available at {\color{red}https://github.com/MmelodYy/RopStitch}.

Robust Image Stitching with Optimal Plane

TL;DR

Robust Image Stitching with Optimal Plane tackles the challenge of cross-scene generalization and content naturalness in image stitching. It introduces RopStitch, an unsupervised framework that combines a dual-branch backbone to inject universal content priors with a virtual optimal stitching plane, mitigating distortions. The method uses correlation-wise feature fusion and an iterative coefficient predictor guided by a minimal semantic distortion constraint to enable bidirectional warps on the optimal plane, achieving strong robustness and visual naturalness. Extensive experiments on the UDIS-D dataset and classical benchmarks show superior cross-domain performance and competitive accuracy, including notable zero-shot generalization, with efficient runtime. The approach advances practical image stitching for diverse real-world scenes and applications in VR, autonomous driving, and surveillance.

Abstract

We present \textit{RopStitch}, an unsupervised deep image stitching framework with both robustness and naturalness. To ensure the robustness of \textit{RopStitch}, we propose to incorporate the universal prior of content perception into the image stitching model by a dual-branch architecture. It separately captures coarse and fine features and integrates them to achieve highly generalizable performance across diverse unseen real-world scenes. Concretely, the dual-branch model consists of a pretrained branch to capture semantically invariant representations and a learnable branch to extract fine-grained discriminative features, which are then merged into a whole by a controllable factor at the correlation level. Besides, considering that content alignment and structural preservation are often contradictory to each other, we propose a concept of virtual optimal planes to relieve this conflict. To this end, we model this problem as a process of estimating homography decomposition coefficients, and design an iterative coefficient predictor and minimal semantic distortion constraint to identify the optimal plane. This scheme is finally incorporated into \textit{RopStitch} by warping both views onto the optimal plane bidirectionally. Extensive experiments across various datasets demonstrate that \textit{RopStitch} significantly outperforms existing methods, particularly in scene robustness and content naturalness. The code is available at {\color{red}https://github.com/MmelodYy/RopStitch}.

Paper Structure

This paper contains 26 sections, 18 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Difference from existing solutions. (a) UIDS++ nie2023parallax uses a single-branch architecture and a single reference plane, resulting in limited cross-scene generalization and content stretching. (b) Our method incorporates perceptual prior through a dual-branch architecture and aligns images on a virtual optimal plane, thereby enhancing cross-scene generalization and natural appearance.
  • Figure 2: The framework of RopStitch. It takes a dual-branch architecture to construct a robust correlation volume, thereby ensuring robust global transformation. Then the single-view projection is decomposed into two bidirectional warps on the optimal plane, which is followed by the bidirectional local deformation.
  • Figure 3: Distortion categories. We measure the distortion degrees beyond the similarity transformation.
  • Figure 4: Performance comparison on classical datasets. Arrows indicate regions with noticeable stretching, while rectangular boxes highlight areas with significant content misalignment.
  • Figure 5: Zero-shot comparative results of learning-based stitching algorithms.
  • ...and 2 more figures