Table of Contents
Fetching ...

Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting

Ziqi Xie, Xiao Lai, Weidong Zhao, Siqi Jiang, Xianhui Liu, Wenlong Hou

TL;DR

The paper tackles the challenge of visible seams in image stitching under uneven hue and large parallax by reframing fusion and rectangling as a reference-driven inpainting problem (RDIStitcher). It introduces a self-supervised training pipeline that fine-tunes a large T2I diffusion model via LoRA using pseudo-stitching signals derived from unlabeled data, and designs a high-capacity framework that uses a larger fusion region with stronger modification intensity than prior methods. To evaluate stitched image quality without ground truth, the authors propose Multimodal Large Language Models (MLLMs)-based metrics (SIQS and MICQS) and validate them against human judgments on a dedicated dataset, while also assessing content consistency and zero-shot generalization on multiple benchmarks. Extensive experiments on UDIS-D and cross-dataset zero-shot tests demonstrate improved content coherence and seam reduction, with notable generalization in challenging scenarios, suggesting practical applicability in real-world stitching tasks. The work also provides a public codebase and a set of evaluation protocols that could influence future assessment of stitched imagery.

Abstract

Current image stitching methods often produce noticeable seams in challenging scenarios such as uneven hue and large parallax. To tackle this problem, we propose the Reference-Driven Inpainting Stitcher (RDIStitcher), which reformulates the image fusion and rectangling as a reference-based inpainting model, incorporating a larger modification fusion area and stronger modification intensity than previous methods. Furthermore, we introduce a self-supervised model training method, which enables the implementation of RDIStitcher without requiring labeled data by fine-tuning a Text-to-Image (T2I) diffusion model. Recognizing difficulties in assessing the quality of stitched images, we present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality. Compared to the state-of-the-art (SOTA) method, extensive experiments demonstrate that our method significantly enhances content coherence and seamless transitions in the stitched images. Especially in the zero-shot experiments, our method exhibits strong generalization capabilities. Code: https://github.com/yayoyo66/RDIStitcher

Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting

TL;DR

The paper tackles the challenge of visible seams in image stitching under uneven hue and large parallax by reframing fusion and rectangling as a reference-driven inpainting problem (RDIStitcher). It introduces a self-supervised training pipeline that fine-tunes a large T2I diffusion model via LoRA using pseudo-stitching signals derived from unlabeled data, and designs a high-capacity framework that uses a larger fusion region with stronger modification intensity than prior methods. To evaluate stitched image quality without ground truth, the authors propose Multimodal Large Language Models (MLLMs)-based metrics (SIQS and MICQS) and validate them against human judgments on a dedicated dataset, while also assessing content consistency and zero-shot generalization on multiple benchmarks. Extensive experiments on UDIS-D and cross-dataset zero-shot tests demonstrate improved content coherence and seam reduction, with notable generalization in challenging scenarios, suggesting practical applicability in real-world stitching tasks. The work also provides a public codebase and a set of evaluation protocols that could influence future assessment of stitched imagery.

Abstract

Current image stitching methods often produce noticeable seams in challenging scenarios such as uneven hue and large parallax. To tackle this problem, we propose the Reference-Driven Inpainting Stitcher (RDIStitcher), which reformulates the image fusion and rectangling as a reference-based inpainting model, incorporating a larger modification fusion area and stronger modification intensity than previous methods. Furthermore, we introduce a self-supervised model training method, which enables the implementation of RDIStitcher without requiring labeled data by fine-tuning a Text-to-Image (T2I) diffusion model. Recognizing difficulties in assessing the quality of stitched images, we present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality. Compared to the state-of-the-art (SOTA) method, extensive experiments demonstrate that our method significantly enhances content coherence and seamless transitions in the stitched images. Especially in the zero-shot experiments, our method exhibits strong generalization capabilities. Code: https://github.com/yayoyo66/RDIStitcher

Paper Structure

This paper contains 39 sections, 12 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Different solutions to image fusion in image stitching. Our method reformulates the fusion and rectangling tasks as a reference-driven inpainting model. By boldly using a larger fusion modification area compared to UDIS++ nie2023parallax and SRStitcher xie2024reconstructing, and applying a stronger modification intensity than UDIS nie2021unsupervised and UDIS++, we achieve a significant advancement in seamless image stitching, particularly in the challenging scenarios involving uneven hue and large parallax. Notice that UDIS and UDIS++ are three-stage architecture methods requiring additional rectangling models to complete the stitching process. Therefore, the rectangling areas for these methods are left blank in the figure.
  • Figure 2: A user experience survey of the recon-based method UDIS nie2021unsupervised, the seam-based method UDIS++ nie2023parallax, and the inpainting-based method SRStitcher xie2024reconstructing on uneven hue and large parallax scenes. Please see the Supplementary Material for more details.
  • Figure 3: The framework of RDIStitcher. (a) Training. For the sake of clarity in the presentation, the input images and masks are simplified. (b) Inference. Details on the specific input structure. (c) Data processing in self-supervised. Details on the self-supervised training method.
  • Figure 4: Multi-image comparative evaluation results. We decide not to give MLLMs-based evaluators the both option as we discover that they consistently favor both good.
  • Figure 5: Qualitative evaluation results. The upper half of the dotted line displays the results on UDIS-D, and the lower half is the results on traditional datasets. We highlight the areas with significant seams and errors using the local magnification and arrows. Special emphasis is placed on the error regions for the LeftRefill method, providing a thorough analysis of its performance limitations. Notice that the last example represents a hybrid challenge with uneven hue, large parallax, and zero-shot conditions. In this highly complex scene, our method shows exceptional performance, markedly surpassing that of previous methods. More results can be found in the supplementary Material.
  • ...and 5 more figures