Table of Contents
Fetching ...

HARGPT: Are LLMs Zero-Shot Human Activity Recognizers?

Sijie Ji, Xinzhe Zheng, Chenshu Wu

TL;DR

This paper investigates whether large language models can function as zero-shot human activity recognizers by ingesting raw IMU data and using prompt-driven reasoning. Using GPT-4 with role-play and step-by-step prompting (chain-of-thought), the approach—HARGPT—achieves strong HAR performance on two datasets, surpassing traditional baselines and showing robustness to unseen users. The findings suggest that LLMs can interpret raw sensor data through their knowledge base, offering a promising path for integrating NLP-oriented foundation models into cyber-physical systems. The work highlights both the potential and the need for standardized benchmarks to fully understand capabilities, limits, and practical deployment in CPS contexts.

Abstract

There is an ongoing debate regarding the potential of Large Language Models (LLMs) as foundational models seamlessly integrated with Cyber-Physical Systems (CPS) for interpreting the physical world. In this paper, we carry out a case study to answer the following question: Are LLMs capable of zero-shot human activity recognition (HAR). Our study, HARGPT, presents an affirmative answer by demonstrating that LLMs can comprehend raw IMU data and perform HAR tasks in a zero-shot manner, with only appropriate prompts. HARGPT inputs raw IMU data into LLMs and utilizes the role-play and think step-by-step strategies for prompting. We benchmark HARGPT on GPT4 using two public datasets of different inter-class similarities and compare various baselines both based on traditional machine learning and state-of-the-art deep classification models. Remarkably, LLMs successfully recognize human activities from raw IMU data and consistently outperform all the baselines on both datasets. Our findings indicate that by effective prompting, LLMs can interpret raw IMU data based on their knowledge base, possessing a promising potential to analyze raw sensor data of the physical world effectively.

HARGPT: Are LLMs Zero-Shot Human Activity Recognizers?

TL;DR

This paper investigates whether large language models can function as zero-shot human activity recognizers by ingesting raw IMU data and using prompt-driven reasoning. Using GPT-4 with role-play and step-by-step prompting (chain-of-thought), the approach—HARGPT—achieves strong HAR performance on two datasets, surpassing traditional baselines and showing robustness to unseen users. The findings suggest that LLMs can interpret raw sensor data through their knowledge base, offering a promising path for integrating NLP-oriented foundation models into cyber-physical systems. The work highlights both the potential and the need for standardized benchmarks to fully understand capabilities, limits, and practical deployment in CPS contexts.

Abstract

There is an ongoing debate regarding the potential of Large Language Models (LLMs) as foundational models seamlessly integrated with Cyber-Physical Systems (CPS) for interpreting the physical world. In this paper, we carry out a case study to answer the following question: Are LLMs capable of zero-shot human activity recognition (HAR). Our study, HARGPT, presents an affirmative answer by demonstrating that LLMs can comprehend raw IMU data and perform HAR tasks in a zero-shot manner, with only appropriate prompts. HARGPT inputs raw IMU data into LLMs and utilizes the role-play and think step-by-step strategies for prompting. We benchmark HARGPT on GPT4 using two public datasets of different inter-class similarities and compare various baselines both based on traditional machine learning and state-of-the-art deep classification models. Remarkably, LLMs successfully recognize human activities from raw IMU data and consistently outperform all the baselines on both datasets. Our findings indicate that by effective prompting, LLMs can interpret raw IMU data based on their knowledge base, possessing a promising potential to analyze raw sensor data of the physical world effectively.
Paper Structure (10 sections, 6 figures, 4 tables)

This paper contains 10 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Workflow of HARGPT.
  • Figure 2: IMU data visualization of two datasets. (a): Capture24 dataset contains four HAR categories with distinct patterns; (b): HHAR dataset contains two similar HAR categories.
  • Figure 3: Chain-of-thought prompt design for HARGPT.
  • Figure 4: Detailed step-by-step inference generated by GPT4 with a walking example.
  • Figure 5: A comparison of the inference results generated by other LLMs for the walking scenario.
  • ...and 1 more figures