Table of Contents
Fetching ...

Human Activity Recognition in an Open World

Derek S. Prijatelj, Samuel Grieggs, Jin Huang, Dawei Du, Ameya Shringi, Christopher Funk, Adam Kaufman, Eric Robertson, Walter J. Scheirer

TL;DR

This work formalizes Human Activity Recognition (HAR) in an open world, where novel activities and nuisance variations continually appear. It introduces an Incremental Open World Learning (OWL) protocol and applies it to construct KOWL-718, a challenging HAR benchmark spanning Kinetics-400, -600, and -700 data while enforcing a most-recent label ordering. The paper analyzes baseline HAR models (X3D and TimeSformer feature extractors, plus ANN and GMM-FINCH/ EVM baselines) under varying feedback budgets and nuisance transforms, highlighting the difficulty of simultaneous mastery of classification, novelty detection, and novelty recognition. A reproducible, containerized pipeline is provided to enable future exploration of novelty handling in HAR as new data and annotations from Kinetics are released, driving progress toward robust, open-world perception systems with real-world applicability.

Abstract

Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released.

Human Activity Recognition in an Open World

TL;DR

This work formalizes Human Activity Recognition (HAR) in an open world, where novel activities and nuisance variations continually appear. It introduces an Incremental Open World Learning (OWL) protocol and applies it to construct KOWL-718, a challenging HAR benchmark spanning Kinetics-400, -600, and -700 data while enforcing a most-recent label ordering. The paper analyzes baseline HAR models (X3D and TimeSformer feature extractors, plus ANN and GMM-FINCH/ EVM baselines) under varying feedback budgets and nuisance transforms, highlighting the difficulty of simultaneous mastery of classification, novelty detection, and novelty recognition. A reproducible, containerized pipeline is provided to enable future exploration of novelty handling in HAR as new data and annotations from Kinetics are released, driving progress toward robust, open-world perception systems with real-world applicability.

Abstract

Managing novelty in perception-based human activity recognition (HAR) is critical in realistic settings to improve task performance over time and ensure solution generalization outside of prior seen samples. Novelty manifests in HAR as unseen samples, activities, objects, environments, and sensor changes, among other ways. Novelty may be task-relevant, such as a new class or new features, or task-irrelevant resulting in nuisance novelty, such as never before seen noise, blur, or distorted video recordings. To perform HAR optimally, algorithmic solutions must be tolerant to nuisance novelty, and learn over time in the face of novelty. This paper 1) formalizes the definition of novelty in HAR building upon the prior definition of novelty in classification tasks, 2) proposes an incremental open world learning (OWL) protocol and applies it to the Kinetics datasets to generate a new benchmark KOWL-718, 3) analyzes the performance of current state-of-the-art HAR models when novelty is introduced over time, 4) provides a containerized and packaged pipeline for reproducing the OWL protocol and for modifying for any future updates to Kinetics. The experimental analysis includes an ablation study of how the different models perform under various conditions as annotated by Kinetics-AVA. The protocol as an algorithm for reproducing experiments using the KOWL-718 benchmark will be publicly released with code and containers at https://github.com/prijatelj/human-activity-recognition-in-an-open-world. The code may be used to analyze different annotations and subsets of the Kinetics datasets in an incremental open world fashion, as well as be extended as further updates to Kinetics are released.
Paper Structure (32 sections, 1 equation, 17 figures, 3 tables)

This paper contains 32 sections, 1 equation, 17 figures, 3 tables.

Figures (17)

  • Figure 1: An example of open world visual human activity recognition starting from Kinetics-400 kay_kinetics_2017 known activities and incrementally learning novel activities from Kinetics-700-2020 carreira_short_2019smaira_short_2020. In Increment 2, without handling novelty, all unknown activities would be misclassified, when it is more correct to label them as "unknown." In an open world, learning unknown activities is desired, as in Increment 3 where "ironing hair" and "blowdrying hair" have been learned by the predictor and represented with stand-in unknown classes 1 and 2 to differentiate from the general unknown class. These stand-in labels remain until the predictor is given (or determines) a human readable label.
  • Figure 2: A visual representation of KOWL-718's class density per increment as used in the experiments when the OWL protocol is applied to Kinetics-400 to Kinetics-700. The incremental learning uses a unified mapping of the Kinetics activity classes based on the most recent label first. Each increment contains a new set of unknown activities and the prior known activities persist as long as the original Kinetics datasets contained samples for them. The most frequent novel classes were introduced first within their respective dataset to help balance the samples over increments. See Section \ref{['sec:protocol']}.
  • Figure 3: The $n$-th increment of the OWL protocol depicting the exchange of data between the evaluator and predictor as they update their internal state. An increment has a pre-feedback phase (whole step $n$) and a post-feedback phase (half step $n+0.5$), if any feedback is given. During pre-feedback, the evaluator gives new samples to the predictor without any labels. The predictor updates given this new unlabeled data, such as semi-supervised learning, and returns its predictions along with the samples ordered by priority for feedback. The evaluator assesses the pre-feedback predictions and saves the samples for which feedback was provided, if any. The ground truth class labels given as feedback are then known to the predictor. The predictor may update its state given any feedback it receives and perform a post-feedback prediction, which is then re-evaluated. This process repeats until the experiment's increments are exhausted. At the initial increment, prior knowledge obtained from external data, such as feature representations from other models, needs to be explicitly stated to be evaluated appropriately. See Section \ref{['sec:abstract_predictor']}.
  • Figure 3: The HAR classification task performance as measured by MCC on Kinetics-400 with its original labels comparing different feature representation and classifier combinations of the predictor. The Original uses the X3D's feichtenhofer_x3d_2020 and TimeSformer's bertasius_is_2021 (TSF) original classifier layers with pretrained weights. These MCC values correspond to the bar chart in Fig. \ref{['sm:fig:frepr']}.
  • Figure 4: Confusion matrices and their derivative measures ignore information across time, such as sample order, making it important to separately measure the novelty reaction time. This figure depicts the variables involved in the novelty reaction time measured in this work in Equation \ref{['eq:react']}.
  • ...and 12 more figures