Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting
Ziqi Xie, Xiao Lai, Weidong Zhao, Siqi Jiang, Xianhui Liu, Wenlong Hou
TL;DR
The paper tackles the challenge of visible seams in image stitching under uneven hue and large parallax by reframing fusion and rectangling as a reference-driven inpainting problem (RDIStitcher). It introduces a self-supervised training pipeline that fine-tunes a large T2I diffusion model via LoRA using pseudo-stitching signals derived from unlabeled data, and designs a high-capacity framework that uses a larger fusion region with stronger modification intensity than prior methods. To evaluate stitched image quality without ground truth, the authors propose Multimodal Large Language Models (MLLMs)-based metrics (SIQS and MICQS) and validate them against human judgments on a dedicated dataset, while also assessing content consistency and zero-shot generalization on multiple benchmarks. Extensive experiments on UDIS-D and cross-dataset zero-shot tests demonstrate improved content coherence and seam reduction, with notable generalization in challenging scenarios, suggesting practical applicability in real-world stitching tasks. The work also provides a public codebase and a set of evaluation protocols that could influence future assessment of stitched imagery.
Abstract
Current image stitching methods often produce noticeable seams in challenging scenarios such as uneven hue and large parallax. To tackle this problem, we propose the Reference-Driven Inpainting Stitcher (RDIStitcher), which reformulates the image fusion and rectangling as a reference-based inpainting model, incorporating a larger modification fusion area and stronger modification intensity than previous methods. Furthermore, we introduce a self-supervised model training method, which enables the implementation of RDIStitcher without requiring labeled data by fine-tuning a Text-to-Image (T2I) diffusion model. Recognizing difficulties in assessing the quality of stitched images, we present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality. Compared to the state-of-the-art (SOTA) method, extensive experiments demonstrate that our method significantly enhances content coherence and seamless transitions in the stitched images. Especially in the zero-shot experiments, our method exhibits strong generalization capabilities. Code: https://github.com/yayoyo66/RDIStitcher
