Strategies and Challenges of Efficient White-Box Training for Human Activity Recognition
Daniel Geissler, Bo Zhou, Paul Lukowicz
TL;DR
HAR from wearable time-series is challenged by temporal dependencies, sensor noise, and placement variability, which hinder interpretability of opaque models. The paper proposes a white-box training framework built around an ML Endoscope that exposes latent-space dynamics through visualization metrics, supported by Human-In-The-Loop interactions and LLM-based guidance to diagnose issues and streamline workflows. Key contributions include a set of visualization strategies (e.g., scatter, parallel coordinates, radar plots), HITL-based refinement, and LLM agent support to translate visual patterns into actionable adjustments, along with an evaluation plan on HAR datasets like PAMAP2 and expert usability assessments. The proposed approach aims to improve misclassification resistance, training efficiency, and user trust, enabling more transparent and practical HAR systems for real-world deployment.
Abstract
Human Activity Recognition using time-series data from wearable sensors poses unique challenges due to complex temporal dependencies, sensor noise, placement variability, and diverse human behaviors. These factors, combined with the nontransparent nature of black-box Machine Learning models impede interpretability and hinder human comprehension of model behavior. This paper addresses these challenges by exploring strategies to enhance interpretability through white-box approaches, which provide actionable insights into latent space dynamics and model behavior during training. By leveraging human intuition and expertise, the proposed framework improves explainability, fosters trust, and promotes transparent Human Activity Recognition systems. A key contribution is the proposal of a Human-in-the-Loop framework that enables dynamic user interaction with models, facilitating iterative refinements to enhance performance and efficiency. Additionally, we investigate the usefulness of Large Language Model as an assistance to provide users with guidance for interpreting visualizations, diagnosing issues, and optimizing workflows. Together, these contributions present a scalable and efficient framework for developing interpretable and accessible Human Activity Recognition systems.
