Table of Contents
Fetching ...

EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer

Fei Wang, Dan Guo, Kun Li, Meng Wang

TL;DR

The unified framework, EulerMormer, is a pioneering effort to first equip with Transformer in learning-based VMM, and introduces a novel dynamic filter that eliminates noise cues and preserves critical features in the motion magnification and amplification generation phases.

Abstract

Video Motion Magnification (VMM) aims to break the resolution limit of human visual perception capability and reveal the imperceptible minor motion that contains valuable information in the macroscopic domain. However, challenges arise in this task due to photon noise inevitably introduced by photographic devices and spatial inconsistency in amplification, leading to flickering artifacts in static fields and motion blur and distortion in dynamic fields in the video. Existing methods focus on explicit motion modeling without emphasizing prioritized denoising during the motion magnification process. This paper proposes a novel dynamic filtering strategy to achieve static-dynamic field adaptive denoising. Specifically, based on Eulerian theory, we separate texture and shape to extract motion representation through inter-frame shape differences, expecting to leverage these subdivided features to solve this task finely. Then, we introduce a novel dynamic filter that eliminates noise cues and preserves critical features in the motion magnification and amplification generation phases. Overall, our unified framework, EulerMormer, is a pioneering effort to first equip with Transformer in learning-based VMM. The core of the dynamic filter lies in a global dynamic sparse cross-covariance attention mechanism that explicitly removes noise while preserving vital information, coupled with a multi-scale dual-path gating mechanism that selectively regulates the dependence on different frequency features to reduce spatial attenuation and complement motion boundaries. We demonstrate extensive experiments that EulerMormer achieves more robust video motion magnification from the Eulerian perspective, significantly outperforming state-of-the-art methods. The source code is available at https://github.com/VUT-HFUT/EulerMormer.

EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer

TL;DR

The unified framework, EulerMormer, is a pioneering effort to first equip with Transformer in learning-based VMM, and introduces a novel dynamic filter that eliminates noise cues and preserves critical features in the motion magnification and amplification generation phases.

Abstract

Video Motion Magnification (VMM) aims to break the resolution limit of human visual perception capability and reveal the imperceptible minor motion that contains valuable information in the macroscopic domain. However, challenges arise in this task due to photon noise inevitably introduced by photographic devices and spatial inconsistency in amplification, leading to flickering artifacts in static fields and motion blur and distortion in dynamic fields in the video. Existing methods focus on explicit motion modeling without emphasizing prioritized denoising during the motion magnification process. This paper proposes a novel dynamic filtering strategy to achieve static-dynamic field adaptive denoising. Specifically, based on Eulerian theory, we separate texture and shape to extract motion representation through inter-frame shape differences, expecting to leverage these subdivided features to solve this task finely. Then, we introduce a novel dynamic filter that eliminates noise cues and preserves critical features in the motion magnification and amplification generation phases. Overall, our unified framework, EulerMormer, is a pioneering effort to first equip with Transformer in learning-based VMM. The core of the dynamic filter lies in a global dynamic sparse cross-covariance attention mechanism that explicitly removes noise while preserving vital information, coupled with a multi-scale dual-path gating mechanism that selectively regulates the dependence on different frequency features to reduce spatial attenuation and complement motion boundaries. We demonstrate extensive experiments that EulerMormer achieves more robust video motion magnification from the Eulerian perspective, significantly outperforming state-of-the-art methods. The source code is available at https://github.com/VUT-HFUT/EulerMormer.
Paper Structure (31 sections, 15 equations, 7 figures, 4 tables)

This paper contains 31 sections, 15 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Theoretical basis and realistic results of video motion magnification. Theoretically, the static field in (a) is free of position displacement, while the dynamic field should exhibit ideal position displacement to satisfy the desired motion magnification. However, in the real world, unavoidable photon noise and spatial inconsistency exist with flickering artifacts, intensity attenuation, etc., as shown in (b) for the magnified results oh2018learningsingh2023lightweight. In contrast, our method achieves more robust magnification in both static and dynamic fields.
  • Figure 2: The overall architecture of EulerMormer for video motion magnification, which consists of three phases: (1) texture-shape disentanglement, (2) motion magnification with a dynamic filter $\mathcal{F}(\cdot)$ and a point-wise magnifier $\mathcal{M}(\cdot)$, and (3) amplification generation, which recouples and refines the original texture $\psi_t(x,t)$ and the magnified shape $\phi'_{s}(x,t)$ to generate high-quality magnified frames. Among them, the dynamic filter $\mathcal{F}(\cdot)$, consisting of DMF in (a) and MGR in (b), performs twice in motion magnification and amplification generation processes, which targets to achieve the static-dynamic field adaptive denoising in terms of texture, shape and motion representation learning.
  • Figure 3: Visualization examples of the synthetic dataset: Synthetic-I, Synthetic-II (Poisson noise) and Synthetic-III (Gaussian blur) datasets. To clarify the motion changes of foreground objects, we mark their reference position as grey.
  • Figure 4: Ablation results of $k$ in Top-$k$ operator on the Synthetic-I dataset.
  • Figure 5: Qualitative results of our method with existing methods on (a) Static, (b) Dynamic and (c) Fabric datasets with magnification factors $\alpha$ of 20, 10, and 20, respectively. We highlight spatial regions where motion occurs and provide spatiotemporal (ST) slices of magnified motion for better comparison.
  • ...and 2 more figures