Large Language Models are Few-Shot Health Learners
Xin Liu, Daniel McDuff, Geza Kovacs, Isaac Galatzer-Levy, Jacob Sunshine, Jiening Zhan, Ming-Zher Poh, Shun Liao, Paolo Di Achille, Shwetak Patel
TL;DR
This work demonstrates that large language models can act as universal few-shot health learners when grounded with numerical time-series data from wearables and clinical sensors. By embedding physiologic data into textual prompts and tuning a soft prompt, a 24B PaLM model achieves substantial gains over zero-shot and supervised baselines across cardiovascular, activity, metabolic, and mental-health tasks. The study highlights the importance of context-rich prompts for enabling domain knowledge to inform health inferences and reveals limitations related to long time-series inputs and arithmetic challenges. Together, these findings suggest a promising direction for integrating LLMs with quantitative health data to support personalized monitoring and health analytics, while underscoring the need for careful evaluation and safety considerations.
Abstract
Large language models (LLMs) can capture rich representations of concepts that are useful for real-world tasks. However, language alone is limited. While existing LLMs excel at text-based inferences, health applications require that models be grounded in numerical data (e.g., vital signs, laboratory values in clinical domains; steps, movement in the wellness domain) that is not easily or readily expressed as text in existing training corpus. We demonstrate that with only few-shot tuning, a large language model is capable of grounding various physiological and behavioral time-series data and making meaningful inferences on numerous health tasks for both clinical and wellness contexts. Using data from wearable and medical sensor recordings, we evaluate these capabilities on the tasks of cardiac signal analysis, physical activity recognition, metabolic calculation (e.g., calories burned), and estimation of stress reports and mental health screeners.
