SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

Zechen Li; Shohreh Deldari; Linyao Chen; Hao Xue; Flora D. Salim

SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, Flora D. Salim

TL;DR

<3-5 sentence high-level summary> SensorLLM tackles the challenge of applying large language models to wearable sensor time-series by creating a two-stage alignment: first, Sensor-Language Alignment that converts multivariate sensor trends into human-readable text using a Chronos-based encoder and an alignment MLP with per-channel tokens; second, Task-Aware Tuning that freezes the backbone and trains a lightweight classifier for HAR. This approach enables LLM-driven reasoning over sensor data and demonstrates strong generalization across five HAR datasets, achieving state-of-the-art results on most benchmarks. The work shows that aligning sensor data with intuitive text can unlock robust, scalable sensor-based reasoning, paving the way for Sensor-Text Multimodal LLMs with practical impact. Code and data-generation pipelines are released to support further research in time-series and text alignment for sensors.

Abstract

We introduce SensorLLM, a two-stage framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from sensor time-series data. Despite their strong reasoning and generalization capabilities, LLMs remain underutilized for motion sensor data due to the lack of semantic context in time-series, computational constraints, and challenges in processing numerical inputs. SensorLLM addresses these limitations through a Sensor-Language Alignment stage, where the model aligns sensor inputs with trend descriptions. Special tokens are introduced to mark channel boundaries. This alignment enables LLMs to capture numerical variations, channel-specific features, and data of varying durations, without requiring human annotations. In the subsequent Task-Aware Tuning stage, we refine the model for HAR classification, achieving performance that matches or surpasses state-of-the-art methods. Our results demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through human-intuitive Sensor-Language Alignment, generalizing across diverse HAR datasets. We believe this work establishes a foundation for future research on time-series and text alignment, paving the way for foundation models in sensor data analysis. Our codes are available at https://github.com/zechenli03/SensorLLM.

SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

TL;DR

Abstract

SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)