Table of Contents
Fetching ...

Event2Audio: Event-Based Optical Vibration Sensing

Mingxuan Cai, Dekel Galor, Amit Pal Singh Kohli, Jacob L. Yates, Laura Waller

TL;DR

Event2Audio introduces an active, event-based vibrometry pipeline that converts laser-speckle motion, captured by a high-temporal-resolution event camera, into an audible waveform. By defocusing a speckle pattern and processing asynchronous events with fast or offline optical-flow, the method achieves real-time or near real-time audio reconstruction that outperforms prior approaches in low-frequency capture, multi-source separation, and robustness to environmental distortions, including echoes and underwater conditions. The approach uses simple, compact optics and avoids complex multi-camera setups, enabling practical deployment and broad applicability in speech recovery, noise-robust demixing, and underwater sensing. Overall, the work demonstrates that event-based sensing can significantly accelerate and improve optical vibrometry for passive-to-active translation of imperceptible vibrations into high-fidelity audio.

Abstract

Small vibrations observed in video can unveil information beyond what is visual, such as sound and material properties. It is possible to passively record these vibrations when they are visually perceptible, or actively amplify their visual contribution with a laser beam when they are not perceptible. In this paper, we improve upon the active sensing approach by leveraging event-based cameras, which are designed to efficiently capture fast motion. We demonstrate our method experimentally by recovering audio from vibrations, even for multiple simultaneous sources, and in the presence of environmental distortions. Our approach matches the state-of-the-art reconstruction quality at much faster speeds, approaching real-time processing.

Event2Audio: Event-Based Optical Vibration Sensing

TL;DR

Event2Audio introduces an active, event-based vibrometry pipeline that converts laser-speckle motion, captured by a high-temporal-resolution event camera, into an audible waveform. By defocusing a speckle pattern and processing asynchronous events with fast or offline optical-flow, the method achieves real-time or near real-time audio reconstruction that outperforms prior approaches in low-frequency capture, multi-source separation, and robustness to environmental distortions, including echoes and underwater conditions. The approach uses simple, compact optics and avoids complex multi-camera setups, enabling practical deployment and broad applicability in speech recovery, noise-robust demixing, and underwater sensing. Overall, the work demonstrates that event-based sensing can significantly accelerate and improve optical vibrometry for passive-to-active translation of imperceptible vibrations into high-fidelity audio.

Abstract

Small vibrations observed in video can unveil information beyond what is visual, such as sound and material properties. It is possible to passively record these vibrations when they are visually perceptible, or actively amplify their visual contribution with a laser beam when they are not perceptible. In this paper, we improve upon the active sensing approach by leveraging event-based cameras, which are designed to efficiently capture fast motion. We demonstrate our method experimentally by recovering audio from vibrations, even for multiple simultaneous sources, and in the presence of environmental distortions. Our approach matches the state-of-the-art reconstruction quality at much faster speeds, approaching real-time processing.

Paper Structure

This paper contains 29 sections, 1 equation, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Schematic of the proposed method. (a) Imaging defocused speckle. A coherent laser illuminates the vibrating surface, generating a defocused speckle pattern on the sensor plane. The pattern's 2D movements are captured by the event sensor. (b) The captured motion is encoded into a stream of asynchronous events. This event stream reflects the motion of the speckle pattern induced by surface vibrations. For each event, a corresponding optical flow vector, consisting of a timestamp and a 2D spatial velocity, can be extracted through optical flow computation. (c) Audio signal extraction from events. (d) Recovered audio waveform.
  • Figure 2: Schematic of tracking speckle from events via optical flow. The flow indicates the current direction of motion, which can be integrated to find the motion trajectory. Pink: positive events. Blue: negative events.
  • Figure 3: Experimental system prototype. The laser illuminates the membrane of the speaker. Then, the reflected speckle pattern is captured by the lens and event camera.
  • Figure 4: Experimental results for recovering a chirp signal and paired tones. (a) A spectrogram of the input up-chirp signal sent to the speaker from 0 to 5 kHz. (b) Recovered chirp spectrogram using our method. (c) A spectrogram of the input paired tones. The reference tone is at 440 Hz with another tone from 441 Hz to 450 Hz. (d) Recovered paired tones spectrogram using our method.
  • Figure 5: Recovered octaves of the note C, from C1 (33 Hz) to C8 (4186 Hz). A single speaker plays eight octaves while a microphone records the audio. Our system captures the speaker membrane's vibrations and reconstructs the audio. The right column corresponds to a zoomed-in version of the left column. (a) Spectrogram of input tones. (b) Spectrogram of tones as recorded with a microphone. It is evident that the microphone recording fails to capture low-frequency tones due to its frequency response limitations and filtering algorithms. (c) Spectrogram of recovered tones. By directly sensing the physical vibrations of the speaker, our system successfully recovers these tones.
  • ...and 8 more figures