PulmoFusion: Advancing Pulmonary Health with Efficient Multi-Modal Fusion
Ahmed Sharshar, Yasser Attia, Mohammad Yaqub, Mohsen Guizani
TL;DR
PulmoFusion tackles the challenge of precise remote spirometry by fusing RGB or thermal video data with patient metadata through energy-efficient Spiking Neural Networks and CNN-based backbones. The approach uses a Multi-Head Attention Layer to fuse video-derived spikes with metadata for both regression of PEF and classification/regression of FEV1/FVC, achieving state-of-the-art performance while emphasizing low-resource efficiency. It demonstrates strong thermal-imaging advantages and fast inference, supporting potential deployment in low-resource settings, though it acknowledges limitations from a small cohort and reliance on manually segmented breathing cycles. The work contributes a novel multimodal framework, a publicly available codebase, and a dataset to accelerate research in non-invasive pulmonary health monitoring.
Abstract
Traditional remote spirometry lacks the precision required for effective pulmonary monitoring. We present a novel, non-invasive approach using multimodal predictive models that integrate RGB or thermal video data with patient metadata. Our method leverages energy-efficient Spiking Neural Networks (SNNs) for the regression of Peak Expiratory Flow (PEF) and classification of Forced Expiratory Volume (FEV1) and Forced Vital Capacity (FVC), using lightweight CNNs to overcome SNN limitations in regression tasks. Multimodal data integration is improved with a Multi-Head Attention Layer, and we employ K-Fold validation and ensemble learning to boost robustness. Using thermal data, our SNN models achieve 92% accuracy on a breathing-cycle basis and 99.5% patient-wise. PEF regression models attain Relative RMSEs of 0.11 (thermal) and 0.26 (RGB), with an MAE of 4.52% for FEV1/FVC predictions, establishing state-of-the-art performance. Code and dataset can be found on https://github.com/ahmed-sharshar/RespiroDynamics.git
