Table of Contents
Fetching ...

GeoDiffMM: Geometry-Guided Conditional Diffusion for Motion Magnification

Xuedeng Liu, Jiabao Guo, Zheng Zhang, Fei Wang, Zhi Liu, Dan Guo

TL;DR

GeoDiffMM tackles subtle-motion magnification by introducing a diffusion-based, geometry-aware framework conditioned on optical flow. It combines Noise-free Optical Flow Augmentation, a Diffusion Motion Magnifier with HHME and an Optical Flow Denoiser, and a Flow-based Video Synthesis module to reconstruct high-fidelity magnified frames. Evaluation on real and synthetic data shows state-of-the-art performance in SSIM, LPIPS, and MANIQA, with improved stability and fewer artifacts across static and dynamic scenarios. The approach demonstrates practical potential for motion analysis tasks and broad VMM applications.

Abstract

Video Motion Magnification (VMM) amplifies subtle macroscopic motions to a perceptible level. Recently, existing mainstream Eulerian approaches address amplification-induced noise via decoupling representation learning such as texture, shape and frequancey schemes, but they still struggle to separate photon noise from true micro-motion when motion displacements are very small. We propose GeoDiffMM, a novel diffusion-based Lagrangian VMM framework conditioned on optical flow as a geometric cue, enabling structurally consistent motion magnification. Specifically, we design a Noise-free Optical Flow Augmentation strategy that synthesizes diverse nonrigid motion fields without photon noise as supervision, helping the model learn more accurate geometry-aware optial flow and generalize better. Next, we develop a Diffusion Motion Magnifier that conditions the denoising process on (i) optical flow as a geometry prior and (ii) a learnable magnification factor controlling magnitude, thereby selectively amplifying motion components consistent with scene semantics and structure while suppressing content-irrelevant perturbations. Finally, we perform Flow-based Video Synthesis to map the amplified motion back to the image domain with high fidelity. Extensive experiments on real and synthetic datasets show that GeoDiffMM outperforms state-of-the-art methods and significantly improves motion magnification.

GeoDiffMM: Geometry-Guided Conditional Diffusion for Motion Magnification

TL;DR

GeoDiffMM tackles subtle-motion magnification by introducing a diffusion-based, geometry-aware framework conditioned on optical flow. It combines Noise-free Optical Flow Augmentation, a Diffusion Motion Magnifier with HHME and an Optical Flow Denoiser, and a Flow-based Video Synthesis module to reconstruct high-fidelity magnified frames. Evaluation on real and synthetic data shows state-of-the-art performance in SSIM, LPIPS, and MANIQA, with improved stability and fewer artifacts across static and dynamic scenarios. The approach demonstrates practical potential for motion analysis tasks and broad VMM applications.

Abstract

Video Motion Magnification (VMM) amplifies subtle macroscopic motions to a perceptible level. Recently, existing mainstream Eulerian approaches address amplification-induced noise via decoupling representation learning such as texture, shape and frequancey schemes, but they still struggle to separate photon noise from true micro-motion when motion displacements are very small. We propose GeoDiffMM, a novel diffusion-based Lagrangian VMM framework conditioned on optical flow as a geometric cue, enabling structurally consistent motion magnification. Specifically, we design a Noise-free Optical Flow Augmentation strategy that synthesizes diverse nonrigid motion fields without photon noise as supervision, helping the model learn more accurate geometry-aware optial flow and generalize better. Next, we develop a Diffusion Motion Magnifier that conditions the denoising process on (i) optical flow as a geometry prior and (ii) a learnable magnification factor controlling magnitude, thereby selectively amplifying motion components consistent with scene semantics and structure while suppressing content-irrelevant perturbations. Finally, we perform Flow-based Video Synthesis to map the amplified motion back to the image domain with high fidelity. Extensive experiments on real and synthetic datasets show that GeoDiffMM outperforms state-of-the-art methods and significantly improves motion magnification.

Paper Structure

This paper contains 24 sections, 19 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Comparison with existing VMM methods. (a) Mainstream Eulerian Method oh2018learningwang2024eulermormerbyung2024learning. (b) Our GeoDiffMM. Our method substantially reduces artifacts and noise accumulation and enhances the stability of motion magnification.
  • Figure 2: Overview of the GeoDiffMM pipeline. (a) Noise-Free Optical Flow Augmentation starts from the flow between a reference and a query frame, suppressing boundary jitter and artifacts to yield a clean, structurally stable conditional flow. (b) Diffusion Motion Magnifier is conditioned on the cleaned flow and an encoded magnification factor to produce a structure-consistent magnified flow. (c) Flow-based Video Synthesis uses the reference frame and the magnified flow for pixel-level resampling and reconstruction, producing a high-fidelity magnified frame with fewer artifacts.
  • Figure 3: The proposed Noise-free Optical Flow Augmentation (NOFA). NOFA composes controllable conditional flows to complement real ones. Both pass through the diffusion magnifier to produce magnified flows with synthetic supervision, improving training stability and guiding the model to learn motion-relevant features instead of noise.
  • Figure 4: Overview of the proposed Diffusion Motion Magnifier (DMM), illustrating the pipeline from Hybrid Harmonic Magnification Encoding (HHME) to the Optical Flow Denoiser (OFD) and convex upsampling for motion magnification.
  • Figure 5: Performance comparison with state-of-the-art methods on SSIM↑ and LPIPS↓ using the Synthetic Dataset. (a) Comparison of magnification accuracy for different magnification factor $\alpha$. (b) Evaluation of model robustness under different noise level $\sigma$.
  • ...and 5 more figures