Efficient and Personalized Mobile Health Event Prediction via Small Language Models
Xin Wang, Ting Dang, Vassilis Kostakos, Hong Jia
TL;DR
The work tackles privacy and latency barriers of cloud-based LLMs for mobile health monitoring by evaluating small language models (SLMs) on-device. It benchmarks five SLMs (including TinyLlama-1.1B) on the PMData healthcare dataset and deploys a 4-bit quantized on-device app on the iPhone 15 Pro Max to assess real-time performance. Results show SLMs can achieve comparable or superior accuracy and MAE to many LLMs on health tasks such as stress, readiness, fatigue, and sleep quality, while dramatically reducing latency and resource usage. TinyLlama-1.1B achieves especially strong efficiency, with substantial latency and first-token generation improvements over larger models, enabling practical on-device privacy-preserving health monitoring. The study demonstrates the feasibility of real-time, privacy-conscious health analytics on mobile devices and outlines future work in expanding datasets, few-shot/instruction-tuning, and quantization-effect investigations.
Abstract
Healthcare monitoring is crucial for early detection, timely intervention, and the ongoing management of health conditions, ultimately improving individuals' quality of life. Recent research shows that Large Language Models (LLMs) have demonstrated impressive performance in supporting healthcare tasks. However, existing LLM-based healthcare solutions typically rely on cloud-based systems, which raise privacy concerns and increase the risk of personal information leakage. As a result, there is growing interest in running these models locally on devices like mobile phones and wearables to protect users' privacy. Small Language Models (SLMs) are potential candidates to solve privacy and computational issues, as they are more efficient and better suited for local deployment. However, the performance of SLMs in healthcare domains has not yet been investigated. This paper examines the capability of SLMs to accurately analyze health data, such as steps, calories, sleep minutes, and other vital statistics, to assess an individual's health status. Our results show that, TinyLlama, which has 1.1 billion parameters, utilizes 4.31 GB memory, and has 0.48s latency, showing the best performance compared other four state-of-the-art (SOTA) SLMs on various healthcare applications. Our results indicate that SLMs could potentially be deployed on wearable or mobile devices for real-time health monitoring, providing a practical solution for efficient and privacy-preserving healthcare.
