Table of Contents
Fetching ...

Event-Based Motion Magnification

Yutian Chen, Shi Guo, Fangzheng Yu, Feng Zhang, Jinwei Gu, Tianfan Xue

TL;DR

This work tackles magnifying small, high-frequency motions without costly high-speed RGB cameras by coupling a temporally-dense event camera with a spatially-dense RGB camera. It presents a physics-informed end-to-end network with an encoder–manipulator–decoder architecture and a Second-order Recurrent Propagation (SRP) module to robustly interpolate many frames while suppressing artifacts via a temporal filter. The authors introduce the first event-RGB motion magnification dataset, SM-ERGB, including synthetic and real subsets, and demonstrate superior frequency fidelity and visual quality over baselines, with favorable data efficiency. The approach holds promise for practical, cost-effective motion analysis in industrial and medical settings, enabling magnification of imperceptible motions at high frequencies. Symbolic derivations of explicit motion solutions from event and RGB data underpin the method’s core physics-based intuition.

Abstract

Detecting and magnifying imperceptible high-frequency motions in real-world scenarios has substantial implications for industrial and medical applications. These motions are characterized by small amplitudes and high frequencies. Traditional motion magnification methods rely on costly high-speed cameras or active light sources, which limit the scope of their applications. In this work, we propose a dual-camera system consisting of an event camera and a conventional RGB camera for video motion magnification, providing temporally-dense information from the event stream and spatially-dense data from the RGB images. This innovative combination enables a broad and cost-effective amplification of high-frequency motions. By revisiting the physical camera model, we observe that estimating motion direction and magnitude necessitates the integration of event streams with additional image features. On this basis, we propose a novel deep network tailored for event-based motion magnification. Our approach utilizes the Second-order Recurrent Propagation module to proficiently interpolate multiple frames while addressing artifacts and distortions induced by magnified motions. Additionally, we employ a temporal filter to distinguish between noise and useful signals, thus minimizing the impact of noise. We also introduced the first event-based motion magnification dataset, which includes a synthetic subset and a real-captured subset for training and benchmarking. Through extensive experiments in magnifying small-amplitude, high-frequency motions, we demonstrate the effectiveness and accuracy of our dual-camera system and network, offering a cost-effective and flexible solution for motion detection and magnification.

Event-Based Motion Magnification

TL;DR

This work tackles magnifying small, high-frequency motions without costly high-speed RGB cameras by coupling a temporally-dense event camera with a spatially-dense RGB camera. It presents a physics-informed end-to-end network with an encoder–manipulator–decoder architecture and a Second-order Recurrent Propagation (SRP) module to robustly interpolate many frames while suppressing artifacts via a temporal filter. The authors introduce the first event-RGB motion magnification dataset, SM-ERGB, including synthetic and real subsets, and demonstrate superior frequency fidelity and visual quality over baselines, with favorable data efficiency. The approach holds promise for practical, cost-effective motion analysis in industrial and medical settings, enabling magnification of imperceptible motions at high frequencies. Symbolic derivations of explicit motion solutions from event and RGB data underpin the method’s core physics-based intuition.

Abstract

Detecting and magnifying imperceptible high-frequency motions in real-world scenarios has substantial implications for industrial and medical applications. These motions are characterized by small amplitudes and high frequencies. Traditional motion magnification methods rely on costly high-speed cameras or active light sources, which limit the scope of their applications. In this work, we propose a dual-camera system consisting of an event camera and a conventional RGB camera for video motion magnification, providing temporally-dense information from the event stream and spatially-dense data from the RGB images. This innovative combination enables a broad and cost-effective amplification of high-frequency motions. By revisiting the physical camera model, we observe that estimating motion direction and magnitude necessitates the integration of event streams with additional image features. On this basis, we propose a novel deep network tailored for event-based motion magnification. Our approach utilizes the Second-order Recurrent Propagation module to proficiently interpolate multiple frames while addressing artifacts and distortions induced by magnified motions. Additionally, we employ a temporal filter to distinguish between noise and useful signals, thus minimizing the impact of noise. We also introduced the first event-based motion magnification dataset, which includes a synthetic subset and a real-captured subset for training and benchmarking. Through extensive experiments in magnifying small-amplitude, high-frequency motions, we demonstrate the effectiveness and accuracy of our dual-camera system and network, offering a cost-effective and flexible solution for motion detection and magnification.
Paper Structure (23 sections, 14 equations, 10 figures, 3 tables)

This paper contains 23 sections, 14 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: The magnified result of the real-captured video. Our proposed model can effectively magnify the motions at the correct frequencies compared to other solutions. For better visualization of the tuning fork's motion, we denote different pixels between different frames using red and cyan respectively in (d). Please refer to the supplementary videos for better visualization of results.
  • Figure 1: Accuracy of our method. Comparison between our magnified output and the audio signal (blue line) over a duration of 0.05 seconds. Our result and the audio signal match closely.
  • Figure 2: A toy example of 1-D sub-pixel motions of a circular object, demonstrating the connection between images $I_0$ (and its gradients $I'_0$), aggregated events $E_{\tau^-}$ from time stamp $t_0$ to $\tau$(blue positive, red negative), and magnified motion $\hat{I}_{{\tau},\alpha=10}$. We can observe that event polarity itself cannot indicate the motion direction, as the left side and right side of the circular object in the first row move to the same direction, but generate events of different polarities (column 3). However, the multiplication of image gradient and event polarity (column 4, denoted as $I'_{0} \cdot E_{\tau^-}$ ) actually correlates with the motion direction. Because on both sides, the multiplication is of the same polarity, and the object is also moving in the same direction. In addition, the third row illustrates a scenario where the image gradients and motion direction are consistent with the second row, but the displacement increases from 0.2 to 0.4. This results in a larger number of generated events compared to the second row, hence amplifying the perceived magnitude of motion.
  • Figure 2: A visualization of texture representation (donated as $V$) and motion representation (donated as $\Delta M$) in the network.
  • Figure 3: Illustration of the framework of our network. The network is designed to utilize RGB images as well as the corresponding asynchronous events stream to generate high frame rate magnified frames. (a) is the overall framework, which consists of three main components: Encoder, Manipulator, and Decoder. In the Encoder, the RGB Branch (b) extracts temporal-invariant texture representations $V_{0}$ and $V_{1}$, as well as shape representations $M$, while the Event Branch (c) extracts temporal-variant motion representations $\Delta M$. The Manipulator linearly magnifies the motion using factor $\alpha$, producing the magnified motion $M_{mag}$. To amplify motion at specific frequencies and reduce noise interference, a temporal filter is employed during inference. The Decoder then reconstructs motion-magnified frames leveraging $M_{mag}$ and texture data.
  • ...and 5 more figures