A Smart-Glasses for Emergency Medical Services via Multimodal Multitask Learning
Liuyi Jin, Pasan Gunawardena, Amran Haroon, Runzhi Wang, Sangwoo Lee, Radu Stoleru, Michael Middleton, Zepeng Huo, Jeeeun Kim, Jason Moats
TL;DR
EMSGlass introduces EMSNet, the first multimodal multitask EMS model, and EMSServe, a low-latency, edge-aware serving framework that handles asynchronous modality arrival. EMSNet fuses text, vitals, and scene images into a unified representation $F_C \\in \\mathbb{R}^{|F_T|+|F_V|+|F_I|}$ to support five EMS tasks, while PMI enables effective learning with highly imbalanced modality data. EMSServe uses a modality-aware splitter, offline inference time profiling, and adaptive offloading with a feature cache to achieve 1.9x–11.7x speedups over direct PyTorch execution. A user study with six EMTs demonstrates improved real-time situational awareness and faster decision-making, advancing practical AI-enabled EMS workflows. The work provides open-source data, code, and models to foster future development of AI-enabled EMS systems that bridge multimodal intelligence with real-world emergency response workflows.
Abstract
Emergency Medical Technicians (EMTs) operate in high-pressure environments, making rapid, life-critical decisions under heavy cognitive and operational loads. We present EMSGlass, a smart-glasses system powered by EMSNet, the first multimodal multitask model for Emergency Medical Services (EMS), and EMSServe, a low-latency multimodal serving framework tailored to EMS scenarios. EMSNet integrates text, vital signs, and scene images to construct a unified real-time understanding of EMS incidents. Trained on real-world multimodal EMS datasets, EMSNet simultaneously supports up to five critical EMS tasks with superior accuracy compared to state-of-the-art unimodal baselines. Built on top of PyTorch, EMSServe introduces a modality-aware model splitter and a feature caching mechanism, achieving adaptive and efficient inference across heterogeneous hardware while addressing the challenge of asynchronous modality arrival in the field. By optimizing multimodal inference execution in EMS scenarios, EMSServe achieves 1.9x -- 11.7x speedup over direct PyTorch multimodal inference. A user study evaluation with six professional EMTs demonstrates that EMSGlass enhances real-time situational awareness, decision-making speed, and operational efficiency through intuitive on-glass interaction. In addition, qualitative insights from the user study provide actionable directions for extending EMSGlass toward next-generation AI-enabled EMS systems, bridging multimodal intelligence with real-world emergency response workflows.
