SoK: Behind the Accuracy of Complex Human Activity Recognition Using Deep Learning
Duc-Anh Nguyen, Nhien-An Le-Khac
TL;DR
The paper addresses why deep learning HAR systems struggle to accurately recognise complex activities in real-world settings. It provides a systematic SoK that categorises factors into sensor, data, algorithm, and evaluation, and defines activity categories by complexity, structure, and interaction. It surveys historical evolution, current limitations, and a broad set of approaches including multimodal fusion, semi/self-supervised learning, and domain generalisation, using fall detection as a guiding example. It outlines practical directions for robust complex HAR, emphasising fair evaluation, cost-aware design, and exploitation of unlabelled data to improve generalisation.
Abstract
Human Activity Recognition (HAR) is a well-studied field with research dating back to the 1980s. Over time, HAR technologies have evolved significantly from manual feature extraction, rule-based algorithms, and simple machine learning models to powerful deep learning models, from one sensor type to a diverse array of sensing modalities. The scope has also expanded from recognising a limited set of activities to encompassing a larger variety of both simple and complex activities. However, there still exist many challenges that hinder advancement in complex activity recognition using modern deep learning methods. In this paper, we comprehensively systematise factors leading to inaccuracy in complex HAR, such as data variety and model capacity. Among many sensor types, we give more attention to wearable and camera due to their prevalence. Through this Systematisation of Knowledge (SoK) paper, readers can gain a solid understanding of the development history and existing challenges of HAR, different categorisations of activities, obstacles in deep learning-based complex HAR that impact accuracy, and potential research directions.
