VCHAR:Variance-Driven Complex Human Activity Recognition framework with Generative Representation
Yuan Sun, Navid Salami Pargoo, Taqiya Ehsan, Zhao Zhang, Jorge Ortiz
TL;DR
This work tackles the challenge of complex human activity recognition under weak labeling in smart spaces by introducing VCHAR, a variance-driven framework that treats atomic outputs as distributions over time intervals and learns via a multitask objective that combines atomic and complex activity losses. A generative decoder, guided by a sensor-based foundation model and one-shot diffusion-based tuning, provides video-based explanations that are accessible to laypersons while LM/VLM components organize information for visualization. Across Opportunity, FallAllD, and Cooking Activity datasets, VCHAR achieves competitive complex-activity recognition (CHAR F1) while providing explainability that outperforms baselines in user studies. The approach reduces labeling requirements and enhances practical applicability in real-world smart environments, though real-time rendering and cross-domain integration remain areas for improvement.
Abstract
Complex human activity recognition (CHAR) remains a pivotal challenge within ubiquitous computing, especially in the context of smart environments. Existing studies typically require meticulous labeling of both atomic and complex activities, a task that is labor-intensive and prone to errors due to the scarcity and inaccuracies of available datasets. Most prior research has focused on datasets that either precisely label atomic activities or, at minimum, their sequence approaches that are often impractical in real world settings.In response, we introduce VCHAR (Variance-Driven Complex Human Activity Recognition), a novel framework that treats the outputs of atomic activities as a distribution over specified intervals. Leveraging generative methodologies, VCHAR elucidates the reasoning behind complex activity classifications through video-based explanations, accessible to users without prior machine learning expertise. Our evaluation across three publicly available datasets demonstrates that VCHAR enhances the accuracy of complex activity recognition without necessitating precise temporal or sequential labeling of atomic activities. Furthermore, user studies confirm that VCHAR's explanations are more intelligible compared to existing methods, facilitating a broader understanding of complex activity recognition among non-experts.
