Table of Contents
Fetching ...

Agile gesture recognition for low-power applications: customisation for generalisation

Ying Liu, Liucheng Guo, Valeri A. Makarovc, Alexander Gorbana, Evgeny Mirkesa, Ivan Y. Tyukin

TL;DR

This work tackles low-power, privacy-conscious hand gesture recognition on embedded devices by augmenting a compact base model with an adaptive error corrector that operates in a high-dimensional feature space. The method leverages few-shot learning and the blessing of dimensionality to distinguish base-model errors from correct predictions and routes corrections via an error-type classifier, demonstrated on capacitive-sensor data from the etee controller. Key contributions include a detailed training/deployment pipeline for new users, a high-dimensional kernel-based error separation approach, and empirical evidence showing improved per-user accuracy without sacrificing generalization. The approach has practical impact for deploying robust, privacy-preserving gesture recognition on low-cost hardware in real-time embedded systems.

Abstract

Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that operate on low-power sensor devices. This is due to the rising concerns for data leakage and end-user privacy, as well as the limited battery capacity and the computing power in low-cost devices. Moreover, the challenge in data collection for individually designed hardware also hinders the generalisation of a gesture recognition model. In this study, we unveil a novel methodology for pattern recognition systems using adaptive and agile error correction, designed to enhance the performance of legacy gesture recognition models on devices with limited battery capacity and computing power. This system comprises a compact Support Vector Machine as the base model for live gesture recognition. Additionally, it features an adaptive agile error corrector that employs few-shot learning within the feature space induced by high-dimensional kernel mappings. The error corrector can be customised for each user, allowing for dynamic adjustments to the gesture prediction based on their movement patterns while maintaining the agile performance of its base model on a low-cost and low-power micro-controller. This proposed system is distinguished by its compact size, rapid processing speed, and low power consumption, making it ideal for a wide range of embedded systems.

Agile gesture recognition for low-power applications: customisation for generalisation

TL;DR

This work tackles low-power, privacy-conscious hand gesture recognition on embedded devices by augmenting a compact base model with an adaptive error corrector that operates in a high-dimensional feature space. The method leverages few-shot learning and the blessing of dimensionality to distinguish base-model errors from correct predictions and routes corrections via an error-type classifier, demonstrated on capacitive-sensor data from the etee controller. Key contributions include a detailed training/deployment pipeline for new users, a high-dimensional kernel-based error separation approach, and empirical evidence showing improved per-user accuracy without sacrificing generalization. The approach has practical impact for deploying robust, privacy-preserving gesture recognition on low-cost hardware in real-time embedded systems.

Abstract

Automated hand gesture recognition has long been a focal point in the AI community. Traditionally, research in this field has predominantly focused on scenarios with access to a continuous flow of hand's images. This focus has been driven by the widespread use of cameras and the abundant availability of image data. However, there is an increasing demand for gesture recognition technologies that operate on low-power sensor devices. This is due to the rising concerns for data leakage and end-user privacy, as well as the limited battery capacity and the computing power in low-cost devices. Moreover, the challenge in data collection for individually designed hardware also hinders the generalisation of a gesture recognition model. In this study, we unveil a novel methodology for pattern recognition systems using adaptive and agile error correction, designed to enhance the performance of legacy gesture recognition models on devices with limited battery capacity and computing power. This system comprises a compact Support Vector Machine as the base model for live gesture recognition. Additionally, it features an adaptive agile error corrector that employs few-shot learning within the feature space induced by high-dimensional kernel mappings. The error corrector can be customised for each user, allowing for dynamic adjustments to the gesture prediction based on their movement patterns while maintaining the agile performance of its base model on a low-cost and low-power micro-controller. This proposed system is distinguished by its compact size, rapid processing speed, and low power consumption, making it ideal for a wide range of embedded systems.
Paper Structure (13 sections, 9 figures, 1 table, 2 algorithms)

This paper contains 13 sections, 9 figures, 1 table, 2 algorithms.

Figures (9)

  • Figure 1: The flowchart of the adaptive error corrector.
  • Figure 2: (a) The wearable etee hand controllers used for collecting gesture recognition data in this work. The controller has a cylindrical silicon shell that fits in the palm of the users' hand. The transparent skeleton supports the rigid structure of the controller. The black sensors wrap around the skeleton and detect signals from each finger as they move and interact with the controller. Inside the skeleton, there is a PCB with a MCU and a battery to support all functions. (b) The names and movements of the 4 dynamic gestures collected for this study. (c) During data collection, gesture start and end were marked. A 500 ms time window, represented by the dashed box, was applied to the signal to extract segments, sliding from the black box to the grey box with each signal frame. (d,e) Original signals on the left are normalised between 0 and 1. The normalisation range is defined by each user and sensor, with zero meaning that the fingers are fully open and one indicating that the sensor is being applied with full pressure by the hand.
  • Figure 3: (a) The solid line is the accumulated percentage of explained variance and the bar is the percentage. The first three PCs cover over 95% of the explained variance in the dataset. (b) All 100 PCs were fed to a decision tree for classification of 4 main dynamic gestures. Only three, first PC (PC1), second PC (PC2) and third PC (PC3) were required for the decision making. (c) Three features - the top three PCs - were used to visualise the dataset and show great sparsity among four gesture labels (colour represents the gesture here). (d) An extra "none" label (black) were added in the dataset showing that they are siting around all the other four gestures.
  • Figure 4: The accuracy box plot of $k$-fold cross-validation for six base systems. The dataset were randomly shuffled and split to the train set and validation set in groups of users. All base systems showed accuracy with train set are close to 1 while the validation accuracy varies around 0.9.
  • Figure 5: The correct samples and error samples are separable thorough Euclidean distance value to the correct data centre. (a) Correct samples and error samples are overlaying with each other from the first two PCs with the highest eigen value. (b) 8 PCs were selected here to calculated the Euclidean distance value of the dataset to the centre of the correct data with 8 features. The distance of the error sample are in the same range as the correct samples. (c) The 8 PCs were expanded through polynomial kernel map with degree of 5. The Euclidean distance calculated in the high dimensional feature space shows that correct data (less than 22) are separable from the errors (mostly above 40).
  • ...and 4 more figures