Understanding Human Activity with Uncertainty Measure for Novelty in Graph Convolutional Networks
Hao Xing, Darius Burschka
TL;DR
This work tackles boundary-sensitive human–object interaction recognition and segmentation under uncertainty by introducing an Uncertainty Quantified Temporal Fusion Graph Convolutional Network (UQ-TFGCN). The architecture combines an attention-based GCN encoder with a novel temporal fusion decoder, while Spectral Normalized Residual (SN-res) preserves distances in feature space to enhance out-of-distribution detection. Uncertainty is quantified using a multivariate Gaussian Process kernel over high-level features, enabling principled novelty scoring via marginal likelihoods. Experiments on Bimanual Actions and IKEA Assembly demonstrate improved boundary accuracy, segmentation performance, and robust OOD detection, albeit with increased computational demands, underscoring the method’s potential for safer human–robot interaction and online action understanding.
Abstract
Understanding human activity is a crucial aspect of developing intelligent robots, particularly in the domain of human-robot collaboration. Nevertheless, existing systems encounter challenges such as over-segmentation, attributed to errors in the up-sampling process of the decoder. In response, we introduce a promising solution: the Temporal Fusion Graph Convolutional Network. This innovative approach aims to rectify the inadequate boundary estimation of individual actions within an activity stream and mitigate the issue of over-segmentation in the temporal dimension. Moreover, systems leveraging human activity recognition frameworks for decision-making necessitate more than just the identification of actions. They require a confidence value indicative of the certainty regarding the correspondence between observations and training examples. This is crucial to prevent overly confident responses to unforeseen scenarios that were not part of the training data and may have resulted in mismatches due to weak similarity measures within the system. To address this, we propose the incorporation of a Spectral Normalized Residual connection aimed at enhancing efficient estimation of novelty in observations. This innovative approach ensures the preservation of input distance within the feature space by imposing constraints on the maximum gradients of weight updates. By limiting these gradients, we promote a more robust handling of novel situations, thereby mitigating the risks associated with overconfidence. Our methodology involves the use of a Gaussian process to quantify the distance in feature space.
