Table of Contents
Fetching ...

Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook

Sizhen Bian, Mengxi Liu, Siyu Yuan, Lala Shakti Swarup Ray, Bo Zhou, Bin Guo, Zhiwen Yu, Thomas Ploetz, Paul Lukowicz, Vitor Fortes Rey

Abstract

Sensor-based Human Activity Recognition (HAR) underpins many ubiquitous and wearable computing applications, yet current models remain limited by scarce labels, sensor heterogeneity, and weak generalization across users, devices, and contexts. Foundation models, which are generally pretrained at scale using self-supervised and multimodal learning, offer a unifying paradigm to address these challenges by learning reusable, adaptable representations for activity understanding. This survey synthesizes emerging foundation models for sensor-based HAR. We first clarify foundational concepts, definitions, and evaluation criteria, then organize existing work using a lifecycle-oriented taxonomy spanning input design, pretraining, adaptation, and utilization. Rather than enumerating individual models, we analyze recurring design patterns and trade-offs across nine technical axes, including modality scope, tokenization, architectures, learning paradigms, adaptation mechanisms, and deployment settings. From this synthesis, we identify three dominant development trajectories: (1) HAR-specific foundation models trained from scratch on large sensor corpora, (2) adaptation of general time-series or multimodal foundation models to sensor-based HAR, and (3) integration of large language models for reasoning, annotation, and human-AI interaction. We conclude by highlighting open challenges in data curation, multimodal alignment, personalization, privacy, and responsible deployment, and outline directions toward general-purpose, interpretable, and human-centered foundation models for activity understanding. A complete, continuously updated index of papers and models is available in our companion repository: https://github.com/zhaxidele/Foundation-Models-Defining-A-New-Era-In-Human-Activity-Recognition.

Foundation Models Defining A New Era In Sensor-based Human Activity Recognition: A Survey And Outlook

Abstract

Sensor-based Human Activity Recognition (HAR) underpins many ubiquitous and wearable computing applications, yet current models remain limited by scarce labels, sensor heterogeneity, and weak generalization across users, devices, and contexts. Foundation models, which are generally pretrained at scale using self-supervised and multimodal learning, offer a unifying paradigm to address these challenges by learning reusable, adaptable representations for activity understanding. This survey synthesizes emerging foundation models for sensor-based HAR. We first clarify foundational concepts, definitions, and evaluation criteria, then organize existing work using a lifecycle-oriented taxonomy spanning input design, pretraining, adaptation, and utilization. Rather than enumerating individual models, we analyze recurring design patterns and trade-offs across nine technical axes, including modality scope, tokenization, architectures, learning paradigms, adaptation mechanisms, and deployment settings. From this synthesis, we identify three dominant development trajectories: (1) HAR-specific foundation models trained from scratch on large sensor corpora, (2) adaptation of general time-series or multimodal foundation models to sensor-based HAR, and (3) integration of large language models for reasoning, annotation, and human-AI interaction. We conclude by highlighting open challenges in data curation, multimodal alignment, personalization, privacy, and responsible deployment, and outline directions toward general-purpose, interpretable, and human-centered foundation models for activity understanding. A complete, continuously updated index of papers and models is available in our companion repository: https://github.com/zhaxidele/Foundation-Models-Defining-A-New-Era-In-Human-Activity-Recognition.

Paper Structure

This paper contains 82 sections, 7 figures, 10 tables.

Figures (7)

  • Figure 1: Historical development of sensor-based Human Activity Recognition (HAR) models. From classical machine learning with hand-crafted features and shallow classifiers bao2004activity to the rise of deep learning with CNNs and RNNs ordonez2016deep, the field progressed toward a phase focused on transfer and domain generalization (robustness across users, devices, and datasets sargano2017humanramasamy2018recentwang2018deep). More recently, self-supervised learning (SSL) approaches have enabled pretraining on unlabeled sensor data using contrastive or masked objectives oord2018cpc. Today, the field is moving toward foundation models, exemplified by large-scale sensor--language alignment sensorlm2025, emphasizing scalability, generalization, and interpretability.
  • Figure 2: Foundations and challenges of sensor-based HAR across multiple abstraction levels: signal-, data-, user-, semantic-, and corpus-level factors jointly define the complexity of learning robust and generalizable activity representations.
  • Figure 3: Definition of Foundation Models wiggins2022opportunities and its adaptation across Computer Vision yuan2021florence, Natural Language Processing paass2023foundation, and Sensor-based Human Activity Recognition (this work).
  • Figure 4: How foundation models address HAR challenges across signal-, data-, user-, semantic-, and corpus-level dimensions.
  • Figure 5: Heuristic 1--7 scores of representative works against six HAR–FM criteria. The “Ideal HAR-FM” panel depicts a target profile. Scores (1 = limited evidence to 7 = strong evidence) are judgment-based syntheses from reported results (compared both to the other models in this survey and to an aspirational “ideal” FM-for-HAR reference point) and are intended for qualitative comparison rather than a leaderboard. Representative works: SelfHAR-700k yuan2024self, MASTER zhu2024master, OneHAR Wei_2025, UniMTS zhang2024unimts, IMU2CLIP moon2023imu2clip, SensorLM zhang2025sensorlm, RelCon xu2024relcon, Chronos HAR Adapters xiong2024novel, MHARFedLLM bandyopadhyay2025mharfedllm, NORMWEAR luo2024normwear, TxP ray2025txp, AURA-MFM matsuishi2025multimodal, Time2Lang pillai2025time2lang, and LSM narayanswamy2024scaling.
  • ...and 2 more figures