Table of Contents
Fetching ...

Time2Lang: Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Beyond Prompting

Arvind Pillai, Dimitris Spathis, Subigya Nepal, Amanda C Collins, Daniel M Mackin, Michael V Heinz, Tess Z Griffin, Nicholas C Jacobson, Andrew Campbell

TL;DR

Time2Lang presents a direct mapping from a Time-Series Foundation Model to a Large Language Model to enable health sensing without text-based prompts. By training on synthetic data with a periodicity pretext and using a frozen Chronos TFM coupled to a frozen LLaMA via learnable adapters, it achieves efficient, scalable inference while preserving key time-series properties such as autocorrelation. The framework yields competitive performance on real-world longitudinal mental health tasks (depression from step counts and flourishing from conversation duration) and demonstrates favorable efficiency compared with prompting, highlighting practical value for continuous, personalized health monitoring. This work establishes a foundation for integrating time-series representations with general-purpose LLMs in healthcare, with potential extensions to multi-modal data, reasoning, and real-time interventions.

Abstract

Large language models (LLMs) show promise for health applications when combined with behavioral sensing data. Traditional approaches convert sensor data into text prompts, but this process is prone to errors, computationally expensive, and requires domain expertise. These challenges are particularly acute when processing extended time series data. While time series foundation models (TFMs) have recently emerged as powerful tools for learning representations from temporal data, bridging TFMs and LLMs remains challenging. Here, we present Time2Lang, a framework that directly maps TFM outputs to LLM representations without intermediate text conversion. Our approach first trains on synthetic data using periodicity prediction as a pretext task, followed by evaluation on mental health classification tasks. We validate Time2Lang on two longitudinal wearable and mobile sensing datasets: daily depression prediction using step count data (17,251 days from 256 participants) and flourishing classification based on conversation duration (46 participants over 10 weeks). Time2Lang maintains near constant inference times regardless of input length, unlike traditional prompting methods. The generated embeddings preserve essential time-series characteristics such as auto-correlation. Our results demonstrate that TFMs and LLMs can be effectively integrated while minimizing information loss and enabling performance transfer across these distinct modeling paradigms. To our knowledge, we are the first to integrate a TFM and an LLM for health, thus establishing a foundation for future research combining general-purpose large models for complex healthcare tasks.

Time2Lang: Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Beyond Prompting

TL;DR

Time2Lang presents a direct mapping from a Time-Series Foundation Model to a Large Language Model to enable health sensing without text-based prompts. By training on synthetic data with a periodicity pretext and using a frozen Chronos TFM coupled to a frozen LLaMA via learnable adapters, it achieves efficient, scalable inference while preserving key time-series properties such as autocorrelation. The framework yields competitive performance on real-world longitudinal mental health tasks (depression from step counts and flourishing from conversation duration) and demonstrates favorable efficiency compared with prompting, highlighting practical value for continuous, personalized health monitoring. This work establishes a foundation for integrating time-series representations with general-purpose LLMs in healthcare, with potential extensions to multi-modal data, reasoning, and real-time interventions.

Abstract

Large language models (LLMs) show promise for health applications when combined with behavioral sensing data. Traditional approaches convert sensor data into text prompts, but this process is prone to errors, computationally expensive, and requires domain expertise. These challenges are particularly acute when processing extended time series data. While time series foundation models (TFMs) have recently emerged as powerful tools for learning representations from temporal data, bridging TFMs and LLMs remains challenging. Here, we present Time2Lang, a framework that directly maps TFM outputs to LLM representations without intermediate text conversion. Our approach first trains on synthetic data using periodicity prediction as a pretext task, followed by evaluation on mental health classification tasks. We validate Time2Lang on two longitudinal wearable and mobile sensing datasets: daily depression prediction using step count data (17,251 days from 256 participants) and flourishing classification based on conversation duration (46 participants over 10 weeks). Time2Lang maintains near constant inference times regardless of input length, unlike traditional prompting methods. The generated embeddings preserve essential time-series characteristics such as auto-correlation. Our results demonstrate that TFMs and LLMs can be effectively integrated while minimizing information loss and enabling performance transfer across these distinct modeling paradigms. To our knowledge, we are the first to integrate a TFM and an LLM for health, thus establishing a foundation for future research combining general-purpose large models for complex healthcare tasks.

Paper Structure

This paper contains 47 sections, 1 equation, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Time2Lang vs traditional prompting. An example of using sensing data to predict depression, comparing text-based prompting (left) with Time2Lang (right). In text-based prompting, the sensor signals are converted into text for LLM prompting. As an alternative, we introduce Time2Lang—our method learns a mapping ($f$ and $g$) between a TFM and an LLM, enabling health sensing tasks without the need for text conversion, while making the most of powerful models.
  • Figure 2: Effect of Increasing Sequence Length on Prompting Performance. We evaluate LLaMA 3.2 (1B) on: (a) a mean prediction task, where mean absolute error increases with sequence length, and (b) tokens, where the number of tokens is $\sim 10\times$ the time-series length.
  • Figure 3: Time2Lang Framework. To meaningfully integrate Timeseries Foundation Models (here: Chronos $C$) and Large Language Models (LLaMA $M$), we train two smaller networks $f$ and $g$ that optimally map TFM features ($\mathbf{z^c}$) to an LLM. The learned embeddings from $f$ and $g$ are $\mathbf{z^i}$ and $\mathbf{z^o}$, respectively. To improve positive knowledge transfer, we use a residual connection between the TFM and LLM features ($\mathbf{z^c} \rightarrow \mathbf{z^m}$) only during training.
  • Figure 4: Time2Lang Training
  • Figure 5: Efficiency Analysis. Inference time (seconds) per sample or Latency comparison between Time2Lang and Prompting for (a) different input sequence lengths, (b) variable-length conversation duration data from StudentLife. We repeat measurements 100 times and observed a variance of $<10^{-2}$.
  • ...and 3 more figures