Table of Contents
Fetching ...

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

Fei Wang, Dan Guo, Kun Li, Zhun Zhong, Meng Wang

TL;DR

FD4MM addresses noise sensitivity in video motion magnification by decoupling spatial information into multi-level high-frequency details $H_s,H_m,H_d$ and a stable low-frequency motion field $\Delta L_d(x,t)$. It introduces a multi-level isomorphic Adaptive Frequency Decoupling Encoder (AFDE), Sparse High-pass/Low-pass Filters, a Point-wise Nonlinear Magnifier, and a Sparse Frequency Mixer, together with a contrastive regularization term. The method achieves higher magnification quality with reduced FLOPs and faster inference on Real-world and Synthetic datasets, outperforming state-of-the-art methods in both SSIM/LPIPS and no-reference MANIQA. This work provides a scalable, noise-robust VMM framework with potential for real-world deployment and downstream applications.

Abstract

Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion information of objects in the macroscopic world. Prior methods directly model the motion field from the Eulerian perspective by Representation Learning that separates shape and texture or Multi-domain Learning from phase fluctuations. Inspired by the frequency spectrum, we observe that the low-frequency components with stable energy always possess spatial structure and less noise, making them suitable for modeling the subtle motion field. To this end, we present FD4MM, a new paradigm of Frequency Decoupling for Motion Magnification with a Multi-level Isomorphic Architecture to capture multi-level high-frequency details and a stable low-frequency structure (motion field) in video space. Since high-frequency details and subtle motions are susceptible to information degradation due to their inherent subtlety and unavoidable external interference from noise, we carefully design Sparse High/Low-pass Filters to enhance the integrity of details and motion structures, and a Sparse Frequency Mixer to promote seamless recoupling. Besides, we innovatively design a contrastive regularization for this task to strengthen the model's ability to discriminate irrelevant features, reducing undesired motion magnification. Extensive experiments on both Real-world and Synthetic Datasets show that our FD4MM outperforms SOTA methods. Meanwhile, FD4MM reduces FLOPs by 1.63$\times$ and boosts inference speed by 1.68$\times$ than the latest method. Our code is available at https://github.com/Jiafei127/FD4MM.

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

TL;DR

FD4MM addresses noise sensitivity in video motion magnification by decoupling spatial information into multi-level high-frequency details and a stable low-frequency motion field . It introduces a multi-level isomorphic Adaptive Frequency Decoupling Encoder (AFDE), Sparse High-pass/Low-pass Filters, a Point-wise Nonlinear Magnifier, and a Sparse Frequency Mixer, together with a contrastive regularization term. The method achieves higher magnification quality with reduced FLOPs and faster inference on Real-world and Synthetic datasets, outperforming state-of-the-art methods in both SSIM/LPIPS and no-reference MANIQA. This work provides a scalable, noise-robust VMM framework with potential for real-world deployment and downstream applications.

Abstract

Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion information of objects in the macroscopic world. Prior methods directly model the motion field from the Eulerian perspective by Representation Learning that separates shape and texture or Multi-domain Learning from phase fluctuations. Inspired by the frequency spectrum, we observe that the low-frequency components with stable energy always possess spatial structure and less noise, making them suitable for modeling the subtle motion field. To this end, we present FD4MM, a new paradigm of Frequency Decoupling for Motion Magnification with a Multi-level Isomorphic Architecture to capture multi-level high-frequency details and a stable low-frequency structure (motion field) in video space. Since high-frequency details and subtle motions are susceptible to information degradation due to their inherent subtlety and unavoidable external interference from noise, we carefully design Sparse High/Low-pass Filters to enhance the integrity of details and motion structures, and a Sparse Frequency Mixer to promote seamless recoupling. Besides, we innovatively design a contrastive regularization for this task to strengthen the model's ability to discriminate irrelevant features, reducing undesired motion magnification. Extensive experiments on both Real-world and Synthetic Datasets show that our FD4MM outperforms SOTA methods. Meanwhile, FD4MM reduces FLOPs by 1.63 and boosts inference speed by 1.68 than the latest method. Our code is available at https://github.com/Jiafei127/FD4MM.
Paper Structure (15 sections, 9 equations, 7 figures, 6 tables)

This paper contains 15 sections, 9 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Learning-based methods for motion magnification. (a) Representation Learning methods oh2018learningdorkenwald2020unsupervisedsingh2023lightweight, (b) Multi-domain Learning method singh2023multi and (c) our Frequency Decoupling method. Inspired by the theory of frequency spectrum si2022inceptionyun2023spanet, we utilise it to separate high- and low-frequency features and leverage their discriminative characteristics for motion magnification.
  • Figure 2: Our idea of frequency decoupling for spatial decomposition. High-frequency features reveal spatial details, enabling an expanded bright field in the spectrum, implying more energy. Low-frequency energy clusters in the central region represent stable spatial structures appropriate for modeling motion.
  • Figure 3: Pipeline of the Multi-level Isomorphic Architecture based on Frequency Decoupling. It aims to decouple a stable motion field and multi-level high-frequency details for magnification and recoupling with a magnification factor $\alpha$, respectively.
  • Figure 4: Overall pipeline of the proposed FD4MM. Based on Multi-level Isomorphic FD4MM Architecture, the multi-level high-frequency details $\{H_s(x,t), H_m(x,t), H_d(x,t)\}$ and the stable motion field ($\delta(x,t)=\Delta L_d(x,t)$), obtained from the reference $I(x,0)$ and query frames $I(x,t)$, are filtered by Sparse High- and Low-pass Filters ($\mathcal{F}_{H}(\cdot)$, $\mathcal{F}_{L}(\cdot)$) to minimize the degradation of high-frequency details and low-frequency motion structures caused by noise, respectively. Then, the motion field $\mathcal{F}_{L}(\delta(x,t))$ is amplified by the Point-wise Nonlinear Magnifier. Next, the Sparse Frequency Mixer $\mathcal{M}(\cdot)$ allows the magnified low-frequency to guide the high-frequency details to complete the seamless recoupling to avoid ringing artifacts, ending up a magnified frame $I_m(x,t)$. Besides, we introduce a novel Contrastive Regularization to eliminate undesired magnification results, thus enhancing the model's robustness and magnification effects.
  • Figure 5: Performance comparison with SOTA methods for SSIM$\uparrow$ and LPIPS$\downarrow$ scores on the Synthetic Dataset. (a) Schematic of a synthetic video. (b) Comparison of magnification accuracy for different $\alpha$. (c) Analysis of model robustness under different $\sigma$.
  • ...and 2 more figures