MoTDiff: High-resolution Motion Trajectory estimation from a single blurred image using Diffusion models
Wontae Choi, Jaelin Lee, Hyung Sup Yun, Byeungwoo Jeon, Il Yong Chun
TL;DR
MoTDiff introduces a high-resolution motion trajectory estimator that operates directly on a single motion-blurred image using a conditional diffusion framework. By leveraging multi-scale features from a Pyramid Vision Transformer and a stepwise feature aggregation strategy, it conditions a lightweight diffusion denoiser to produce a dense $256\times256$ motion trajectory map, further enhanced by a training loss that combines weighted BCE and IoU and a connectivity-promoting STPD method. The approach yields state-of-the-art gains in blind image deblurring and coded exposure photography, demonstrated on synthetic GoPro-derived data and real RSBlur images, with ablations validating the contribution of multi-scale conditioning, the novel loss, and STPD. This work advances motion representation fidelity, enabling more accurate PSF modeling and more effective code optimization in CEP applications, with potential for end-to-end task integration in future work.
Abstract
Accurate estimation of motion information is crucial in diverse computational imaging and computer vision applications. Researchers have investigated various methods to extract motion information from a single blurred image, including blur kernels and optical flow. However, existing motion representations are often of low quality, i.e., coarse-grained and inaccurate. In this paper, we propose the first high-resolution (HR) Motion Trajectory estimation framework using Diffusion models (MoTDiff). Different from existing motion representations, we aim to estimate an HR motion trajectory with high-quality from a single motion-blurred image. The proposed MoTDiff consists of two key components: 1) a new conditional diffusion framework that uses multi-scale feature maps extracted from a single blurred image as a condition, and 2) a new training method that can promote precise identification of a fine-grained motion trajectory, consistent estimation of overall shape and position of a motion path, and pixel connectivity along a motion trajectory. Our experiments demonstrate that the proposed MoTDiff can outperform state-of-the-art methods in both blind image deblurring and coded exposure photography applications.
