Table of Contents
Fetching ...

DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis

Xinglong Luo, Ao Luo, Zhengning Wang, Yueqi Yang, Chaoyu Feng, Lei Lei, Bing Zeng, Shuaicheng Liu

TL;DR

DMAligner is crafted to tackle the challenges in image alignment from a new perspective, employing a generation-based solution that showcases strong capabilities and avoids the problems associated with flow-based image warping.

Abstract

Image alignment is a fundamental task in computer vision with broad applications. Existing methods predominantly employ optical flow-based image warping. However, this technique is susceptible to common challenges such as occlusions and illumination variations, leading to degraded alignment visual quality and compromised accuracy in downstream tasks. In this paper, we present DMAligner, a diffusion-based framework for image alignment through alignment-oriented view synthesis. DMAligner is crafted to tackle the challenges in image alignment from a new perspective, employing a generation-based solution that showcases strong capabilities and avoids the problems associated with flow-based image warping. Specifically, we propose a Dynamics-aware Diffusion Training approach for learning conditional image generation, synthesizing a novel view for image alignment. This incorporates a Dynamics-aware Mask Producing (DMP) module to adaptively distinguish dynamic foreground regions from static backgrounds, enabling the diffusion model to more effectively handle challenges that classical methods struggle to solve. Furthermore, we develop the Dynamic Scene Image Alignment (DSIA) dataset using Blender, which includes 1,033 indoor and outdoor scenes with over 30K image pairs tailored for image alignment. Extensive experimental results demonstrate the superiority of the proposed approach on DSIA benchmarks, as well as on a series of widely-used video datasets for qualitative comparisons. Our code is available at https://github.com/boomluo02/DMAligner.

DMAligner: Enhancing Image Alignment via Diffusion Model Based View Synthesis

TL;DR

DMAligner is crafted to tackle the challenges in image alignment from a new perspective, employing a generation-based solution that showcases strong capabilities and avoids the problems associated with flow-based image warping.

Abstract

Image alignment is a fundamental task in computer vision with broad applications. Existing methods predominantly employ optical flow-based image warping. However, this technique is susceptible to common challenges such as occlusions and illumination variations, leading to degraded alignment visual quality and compromised accuracy in downstream tasks. In this paper, we present DMAligner, a diffusion-based framework for image alignment through alignment-oriented view synthesis. DMAligner is crafted to tackle the challenges in image alignment from a new perspective, employing a generation-based solution that showcases strong capabilities and avoids the problems associated with flow-based image warping. Specifically, we propose a Dynamics-aware Diffusion Training approach for learning conditional image generation, synthesizing a novel view for image alignment. This incorporates a Dynamics-aware Mask Producing (DMP) module to adaptively distinguish dynamic foreground regions from static backgrounds, enabling the diffusion model to more effectively handle challenges that classical methods struggle to solve. Furthermore, we develop the Dynamic Scene Image Alignment (DSIA) dataset using Blender, which includes 1,033 indoor and outdoor scenes with over 30K image pairs tailored for image alignment. Extensive experimental results demonstrate the superiority of the proposed approach on DSIA benchmarks, as well as on a series of widely-used video datasets for qualitative comparisons. Our code is available at https://github.com/boomluo02/DMAligner.
Paper Structure (19 sections, 14 equations, 8 figures, 4 tables)

This paper contains 19 sections, 14 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: (a) Conventional image alignment based on optical flow and image warping, resulting in ghosting artifacts and occlusion. (b) Our DMAligner directly generate the complete alignment image via diffusion-based view synthesis.
  • Figure 2: Overview of DSIA dataset generation. The ground truth image is rendered by setting the time to $t_2$ and the camera pose to $P_1$, where the background statics mirror those in $I_1$, and the dynamics are akin to $I_2$, with slight visualization variances caused by pose changes. As a result, $I_{gt}$ can be regarded as the alignment-oriented view synthesis with reference from $I_2$ to $I_1$ ( i.e.,${I_2}'$ for $I_2$ to $I_1$ alignment).
  • Figure 3: Overview of our DMAligner. Instead of using the discriminative learning paradigm for optical flow estimation and image warping, our framework employs a generative approach to achieve image alignment with a diffusion model. Dynamics-aware Mask Producing (DMP) module is crucial for providing dynamic information, essential for performing the Dynamics-aware Diffusion Training process in this task.
  • Figure 4: Qualitative comparisons between DPFlow Morimitsu2025DPFlow, COGS jiang2024cogs, GenWarp seo2024genwarp and our DMAligner on our DSIA dataset.
  • Figure 5: Qualitative comparisons between DPFlow Morimitsu2025DPFlow, GenWarp seo2024genwarp and our DMAligner on Sintel dataset.
  • ...and 3 more figures