CardioLive: Empowering Video Streaming with Online Cardiac Monitoring
Sheng Lyu, Ruiming Huang, Sijie Ji, Yasar Abbas Ur Rehman, Lan Ma, Chenshu Wu
TL;DR
CardioLive presents a pioneering online cardiac monitoring system that fuses coexisting video and audio streams to infer heart rate during real-time video streaming. The core is CardioNet, an audio-visual network with a video branch featuring temporal-differencing and frequency-aware modules and an audio branch using raw waveforms with learnable temporal-frequency filters, fused through a multi-head attention mechanism. The system is designed as a plug-and-play middleware capable of operating in edge or cloud environments, including robust buffering and synchronization to handle changing FPS and unsynchronized streams. Empirical results show CardioLive achieving an average MAE of 1.79 BPM, outperforming single-modality baselines by substantial margins, and delivering real-time throughput (e.g., 115.97 FPS on Zoom and 98.16 FPS on YouTube) with modest latency, demonstrating practical viability for health, affective computing, and security applications in streaming platforms.
Abstract
Online Cardiac Monitoring (OCM) emerges as a compelling enhancement for the next-generation video streaming platforms. It enables various applications including remote health, online affective computing, and deepfake detection. Yet the physiological information encapsulated in the video streams has been long neglected. In this paper, we present the design and implementation of CardioLive, the first online cardiac monitoring system in video streaming platforms. We leverage the naturally co-existed video and audio streams and devise CardioNet, the first audio-visual network to learn the cardiac series. It incorporates multiple unique designs to extract temporal and spectral features, ensuring robust performance under realistic video streaming conditions. To enable the Service-On-Demand online cardiac monitoring, we implement CardioLive as a plug-and-play middleware service and develop systematic solutions to practical issues including changing FPS and unsynchronized streams. Extensive experiments have been done to demonstrate the effectiveness of our system. We achieve a Mean Square Error (MAE) of 1.79 BPM error, outperforming the video-only and audio-only solutions by 69.2% and 81.2%, respectively. Our CardioLive service achieves average throughputs of 115.97 and 98.16 FPS when implemented in Zoom and YouTube. We believe our work opens up new applications for video stream systems. We will release the code soon.
