Table of Contents
Fetching ...

RecDiffusion: Rectangling for Image Stitching with Diffusion Models

Tianhao Zhou, Haipeng Li, Ziyi Wang, Ao Luo, Chen-Lin Zhang, Jiajun Li, Bing Zeng, Shuaicheng Liu

TL;DR

RecDiffusion tackles the problem of irregular boundaries in stitched images by introducing a two-stage diffusion framework. It first uses a Motion Diffusion Model ($MDM$) to generate rectangling motion fields and warp the stitched image, then employs a Content Diffusion Model ($CDM$) to refine the content, guided by a weighted confidence map derived from a Rank-Nullity-inspired sampling strategy. The approach achieves state-of-the-art quantitative and qualitative results on public benchmarks, outperforming cropping, inpainting, and warping-based methods while preserving content integrity. The method offers robust rectangling with geometric accuracy and visual appeal, and the public release of code and weights facilitates application to related motion-rectangling tasks.

Abstract

Image stitching from different captures often results in non-rectangular boundaries, which is often considered unappealing. To solve non-rectangular boundaries, current solutions involve cropping, which discards image content, inpainting, which can introduce unrelated content, or warping, which can distort non-linear features and introduce artifacts. To overcome these issues, we introduce a novel diffusion-based learning framework, \textbf{RecDiffusion}, for image stitching rectangling. This framework combines Motion Diffusion Models (MDM) to generate motion fields, effectively transitioning from the stitched image's irregular borders to a geometrically corrected intermediary. Followed by Content Diffusion Models (CDM) for image detail refinement. Notably, our sampling process utilizes a weighted map to identify regions needing correction during each iteration of CDM. Our RecDiffusion ensures geometric accuracy and overall visual appeal, surpassing all previous methods in both quantitative and qualitative measures when evaluated on public benchmarks. Code is released at https://github.com/lhaippp/RecDiffusion.

RecDiffusion: Rectangling for Image Stitching with Diffusion Models

TL;DR

RecDiffusion tackles the problem of irregular boundaries in stitched images by introducing a two-stage diffusion framework. It first uses a Motion Diffusion Model () to generate rectangling motion fields and warp the stitched image, then employs a Content Diffusion Model () to refine the content, guided by a weighted confidence map derived from a Rank-Nullity-inspired sampling strategy. The approach achieves state-of-the-art quantitative and qualitative results on public benchmarks, outperforming cropping, inpainting, and warping-based methods while preserving content integrity. The method offers robust rectangling with geometric accuracy and visual appeal, and the public release of code and weights facilitates application to related motion-rectangling tasks.

Abstract

Image stitching from different captures often results in non-rectangular boundaries, which is often considered unappealing. To solve non-rectangular boundaries, current solutions involve cropping, which discards image content, inpainting, which can introduce unrelated content, or warping, which can distort non-linear features and introduce artifacts. To overcome these issues, we introduce a novel diffusion-based learning framework, \textbf{RecDiffusion}, for image stitching rectangling. This framework combines Motion Diffusion Models (MDM) to generate motion fields, effectively transitioning from the stitched image's irregular borders to a geometrically corrected intermediary. Followed by Content Diffusion Models (CDM) for image detail refinement. Notably, our sampling process utilizes a weighted map to identify regions needing correction during each iteration of CDM. Our RecDiffusion ensures geometric accuracy and overall visual appeal, surpassing all previous methods in both quantitative and qualitative measures when evaluated on public benchmarks. Code is released at https://github.com/lhaippp/RecDiffusion.
Paper Structure (19 sections, 16 equations, 8 figures, 4 tables)

This paper contains 19 sections, 16 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Visual comparisons of our proposed RecDiffusion and previous rectangling approaches including cropping-based, He et al.he2013rectangling, inpainting using Stable Diffusion rombach2022high, and Nie et al.nie2022deep. We can see that simple cropping reduces the field-of-view, the inpainting-based method introduces unsatisfactory extra contents, He et al.he2013rectangling presents distortion and edge artifacts, and Nie et al.nie2022deep unable to maintain a satisfactory rectangular boundary. In contrast, our method properly complements the boundaries and avoids artifacts
  • Figure 2: Workflow of RecDiffusion. Initially, Motion Diffusion Models (MDM) are employed to convert irregularly-bordered stitched images into a seamless rectangular form via generated motion fields, which occasionally introduce artifacts like distortion (highlighed by the red box). Content Diffusion Models (CDM) subsequently refine these images.
  • Figure 3: Overview of training procedures. The left block illustrates the training of MDM, which generates motion fields $\mathbf{\hat{x}_0}$ from stitched images $I_{\mathbf{S}}$ and their masks $M_{\mathbf{S}}$, transforming $I_{\mathbf{S}}$ into rectangling images $I_{\mathbf{\hat{R}}}$. The right block shows the training of CDM under the same conditions ($I_{\mathbf{S}}$, $M_{\mathbf{S}}$) to directly generate a rectangling result $\mathbf{{x^{\prime}_0}}$. Both methods aim to reconstruct high-definition rectangling images from stitched inputs, respectively realizing it via motion and content-based manners.
  • Figure 4: Illustration of the sampling procedure. Initially, stitching images $I_\mathbf{S}$ and masks $M_\mathbf{S}$ are processed by MDM, which generates motion fields $\hat{\mathbf{x}}_0$ iteratively and warps $I_\mathbf{S}$ to form preliminary rectangling images $I_\mathbf{\hat{R}}$ with corresponding confidence masks $M_\mathbf{\hat{R}}$. Secondly, for each sampling step, CDM polishes these images by keeping confidence regions $M_\mathbf{\hat{R}}$ of $I_\mathbf{\hat{R}}$ and updating non-confidence regions $(1-M_\mathbf{\hat{R}})$ via the output of CDM $\mathbf{x}^{\prime}_0$. As a result, we are capable of iteratively reconstructing ideal rectangling images.
  • Figure 5: Comparative Evaluation of Nie et al.nie2022deep on the DIR-D Dataset. The input stitched images and the GT rectangling references are displayed in the first two columns. The third column shows the rectangling results by Nie et al., while our proposed diffusion models-based outcomes are exhibited in the last column. In figure (a), red arrows accentuate white edge artifacts present in the outcomes of the previous state-of-the-art. Figure (b) scrutinizes the presence of internal artifacts such as line discontinuities and local distortions, highlighted within Regions of Interest (ROIs) circled on alignment heatmaps where darker shades signal higher fidelity to the ground truth. Our results demonstrate enhanced similarity to the ground truth, indicating a significant reduction in artifacts compared to the previous method.
  • ...and 3 more figures