Table of Contents
Fetching ...

Scaling Wearable Foundation Models

Girish Narayanswamy, Xin Liu, Kumar Ayush, Yuzhe Yang, Xuhai Xu, Shun Liao, Jake Garrison, Shyam Tailor, Jake Sunshine, Yun Liu, Tim Althoff, Shrikanth Narayanan, Pushmeet Kohli, Jiening Zhan, Mark Malhotra, Shwetak Patel, Samy Abdel-Ghaffar, Daniel McDuff

TL;DR

This work creates LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date, and establishes the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities.

Abstract

Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition.

Scaling Wearable Foundation Models

TL;DR

This work creates LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date, and establishes the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities.

Abstract

Wearable sensors have become ubiquitous thanks to a variety of health tracking features. The resulting continuous and longitudinal measurements from everyday life generate large volumes of data; however, making sense of these observations for scientific and actionable insights is non-trivial. Inspired by the empirical success of generative modeling, where large neural networks learn powerful representations from vast amounts of text, image, video, or audio data, we investigate the scaling properties of sensor foundation models across compute, data, and model size. Using a dataset of up to 40 million hours of in-situ heart rate, heart rate variability, electrodermal activity, accelerometer, skin temperature, and altimeter per-minute data from over 165,000 people, we create LSM, a multimodal foundation model built on the largest wearable-signals dataset with the most extensive range of sensor modalities to date. Our results establish the scaling laws of LSM for tasks such as imputation, interpolation and extrapolation, both across time and sensor modalities. Moreover, we highlight how LSM enables sample-efficient downstream learning for tasks like exercise and activity recognition.

Paper Structure

This paper contains 35 sections, 13 figures, 15 tables.

Figures (13)

  • Figure 1: Scaling foundation models on wearable data. Making sense of physiological and behavioral signals derived from wearables is challenging. (A) We present a systematic scaling analysis of sensor models using up to 40 million hours of multimodal data from over 165,000 people. (B) Using a random masking pretext task, we evaluate on tasks of imputation, forecasting, and downstream classification. (C) Experiments show scaling compute, data, and model size are all effective. Scaling is shown on the random imputation task.
  • Figure 2: Generative LSM tasks and pretraining. We define four distinct generative tasks: random imputation, temporal interpolation, signal/sensor imputation, and temporal extrapolation (forecasting). Random imputation was empirically chosen as the pretraining task.
  • Figure 3: Scaling performance of LSM. We show performance on generative tasks across varying data and model sizes. LSM begins to saturate at approximately $10^7$ hours of data. The effects of scaling are more pronounced in imputation, interpolation, and extrapolation tasks. Results indicate that as model size increases, significantly larger data volumes are required to prevent overfitting.
  • Figure 4: Analysis on scaling LSM.(a) Total number of hours is more important than total number of subjects. (b) Data scaling on discriminative tasks with ViT-110M. (c) Larger models are more sample efficient.
  • Figure 5: LSM MAE pretrain masking strategies. All strategies employ a masking ratio of 0.8. (A): original, unmasked sensor image, (B): random masking, (C): structured temporal masking, (D): structured sensor masking, (E): temporal extrapolation masking, (F): temporal interpolation masking. Both random and structured temporal masking enable strong down-stream performance. We select random masking for all scaling experiments and evaluations.
  • ...and 8 more figures