Table of Contents
Fetching ...

Towards Open-World Gesture Recognition

Junxiao Shen, Matthias De Lange, Xuhai "Orson" Xu, Enmin Zhou, Ran Tan, Naveen Suda, Maciej Lazarewicz, Per Ola Kristensson, Amy Karlson, Evan Strasnick

TL;DR

A design engineering approach is proposed that enables offline analysis on a collected large-scale dataset by systematically examining various parameters and comparing different continual learning methods to enable machine learning models to be adaptive to new tasks without degrading performance on previously learned tasks.

Abstract

Providing users with accurate gestural interfaces, such as gesture recognition based on wrist-worn devices, is a key challenge in mixed reality. However, static machine learning processes in gesture recognition assume that training and test data come from the same underlying distribution. Unfortunately, in real-world applications involving gesture recognition, such as gesture recognition based on wrist-worn devices, the data distribution may change over time. We formulate this problem of adapting recognition models to new tasks, where new data patterns emerge, as open-world gesture recognition (OWGR). We propose the use of continual learning to enable machine learning models to be adaptive to new tasks without degrading performance on previously learned tasks. However, the process of exploring parameters for questions around when, and how, to train and deploy recognition models requires resource-intensive user studies may be impractical. To address this challenge, we propose a design engineering approach that enables offline analysis on a collected large-scale dataset by systematically examining various parameters and comparing different continual learning methods. Finally, we provide design guidelines to enhance the development of an open-world wrist-worn gesture recognition process.

Towards Open-World Gesture Recognition

TL;DR

A design engineering approach is proposed that enables offline analysis on a collected large-scale dataset by systematically examining various parameters and comparing different continual learning methods to enable machine learning models to be adaptive to new tasks without degrading performance on previously learned tasks.

Abstract

Providing users with accurate gestural interfaces, such as gesture recognition based on wrist-worn devices, is a key challenge in mixed reality. However, static machine learning processes in gesture recognition assume that training and test data come from the same underlying distribution. Unfortunately, in real-world applications involving gesture recognition, such as gesture recognition based on wrist-worn devices, the data distribution may change over time. We formulate this problem of adapting recognition models to new tasks, where new data patterns emerge, as open-world gesture recognition (OWGR). We propose the use of continual learning to enable machine learning models to be adaptive to new tasks without degrading performance on previously learned tasks. However, the process of exploring parameters for questions around when, and how, to train and deploy recognition models requires resource-intensive user studies may be impractical. To address this challenge, we propose a design engineering approach that enables offline analysis on a collected large-scale dataset by systematically examining various parameters and comparing different continual learning methods. Finally, we provide design guidelines to enhance the development of an open-world wrist-worn gesture recognition process.
Paper Structure (25 sections, 1 equation, 5 figures)

This paper contains 25 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: The open-world gesture recognition process is structured around five stages. Out of these, two stages including retraining trigger and updating policy introduce essential design considerations, all underpinned by an engineering approach that facilitates offline analysis. In contrast, the other three stages including device deployment, data logging and data curation require online analysis.
  • Figure 2: Four dynamic gestures.
  • Figure 3: We examine the accuracy of a gesture recognition model, in an open-world gesture recognition problem, tested on Task $n$, while the model is trained incrementally from Task $n$ to Task $N$. $N$ is the total number of tasks, and the model is trained by five continual learning methods and one baseline method: Finetuning. For instance, in a subplot with title Task 2, we test the model, progressively trained from Task 2 through Task 10, exclusively on Task 2 and report 9 data points. It lacks the data points from the model trained on Task 1 and tested on Task 2, because the tasks were introduced sequentially; a model trained on Task 1 has not been exposed to Task 2. Consequently, the subplot for Task 2 omits these results. This pattern continues with Task 3 and subsequent tasks, with a subplot of Task $n$ only reports $N-n+1$ data points (per method). The observed deep decline in accuracy from Finetuning in each subplot can be attributed to the model's tendency to forget earlier tasks as it learns new ones, which is referred to catastrophic forgetting. We proposed five continual learning methods to address this catastrophic forgetting problem in open-world gesture recognition, namely SI, replay, LWF, PackNet and MAS, which are introduced in detail in Section \ref{['sec:function_carriers']}. In this Figure, we illustrate the results from performing preliminary experiments on two use cases of open-world gesture recognition. We report average accuracy (forgetting) in the legend. Forgetting is the measure of the decrease in accuracy. We discuss these two measures in detail in Section \ref{['sec:metrics']}.
  • Figure 4: Effect of Task Setting on Performance for New Context Each sub-figure represents one task setting. Each box plot shows mean, median, and quartile data. a) Order of Contexts: Change of context order does not significantly affect average accuracy and forgetting measure for most methods. The X-axis represents a specific order of the tasks. E-H is easy-to-hard, H-E is hard-to-easy. b) Granularity of Contexts: Some methods perform particularly better or worse in fine-grained context than coarse context. The X-axis represents that if the context is coarse or fine. c) Number of Contexts: Different methods exhibits various performances when number of contexts (tasks) increases. The X-axis represents the total number of contexts (tasks)
  • Figure 5: Effect of Number of Total Tasks for New User. A larger number of total users leads to higher accuracy and forgetting measure for most methods. Each box plot shows mean, median, and quartile data. The X-axis represents the total number of users that the model must learn in the New User case.