Table of Contents
Fetching ...

Resource-Efficient Gesture Recognition using Low-Resolution Thermal Camera via Spiking Neural Networks and Sparse Segmentation

Ali Safa, Wout Mommen, Lars Keuninckx

TL;DR

The paper addresses robust gesture recognition from a low-resolution ($24 \times 32$) thermal camera in automotive settings, aiming for low power and low cost. It introduces a two-stage approach: a memory-efficient MMV-based SNN wake-up detector and an downstream R-PCA-based sparse segmentation with hand-crafted trajectory features for classification. The method achieves up to $93.9\%$ accuracy on a 5-class task while using orders of magnitude less memory and compute than typical deep neural networks, facilitated by the decomposition $M = L + S$ with the optimization $L,S = argmin_{L,S} ||L||_* + λ ||S||_1$ s.t. $M = L + S$. The authors also release a new low-resolution thermal gesture dataset to support future research in in-car human–machine interaction.

Abstract

This work proposes a novel approach for hand gesture recognition using an inexpensive, low-resolution (24 x 32) thermal sensor processed by a Spiking Neural Network (SNN) followed by Sparse Segmentation and feature-based gesture classification via Robust Principal Component Analysis (R-PCA). Compared to the use of standard RGB cameras, the proposed system is insensitive to lighting variations while being significantly less expensive compared to high-frequency radars, time-of-flight cameras and high-resolution thermal sensors previously used in literature. Crucially, this paper shows that the innovative use of the recently proposed Monostable Multivibrator (MMV) neural networks as a new class of SNN achieves more than one order of magnitude smaller memory and compute complexity compared to deep learning approaches, while reaching a top gesture recognition accuracy of 93.9% using a 5-class thermal camera dataset acquired in a car cabin, within an automotive context. Our dataset is released for helping future research.

Resource-Efficient Gesture Recognition using Low-Resolution Thermal Camera via Spiking Neural Networks and Sparse Segmentation

TL;DR

The paper addresses robust gesture recognition from a low-resolution () thermal camera in automotive settings, aiming for low power and low cost. It introduces a two-stage approach: a memory-efficient MMV-based SNN wake-up detector and an downstream R-PCA-based sparse segmentation with hand-crafted trajectory features for classification. The method achieves up to accuracy on a 5-class task while using orders of magnitude less memory and compute than typical deep neural networks, facilitated by the decomposition with the optimization s.t. . The authors also release a new low-resolution thermal gesture dataset to support future research in in-car human–machine interaction.

Abstract

This work proposes a novel approach for hand gesture recognition using an inexpensive, low-resolution (24 x 32) thermal sensor processed by a Spiking Neural Network (SNN) followed by Sparse Segmentation and feature-based gesture classification via Robust Principal Component Analysis (R-PCA). Compared to the use of standard RGB cameras, the proposed system is insensitive to lighting variations while being significantly less expensive compared to high-frequency radars, time-of-flight cameras and high-resolution thermal sensors previously used in literature. Crucially, this paper shows that the innovative use of the recently proposed Monostable Multivibrator (MMV) neural networks as a new class of SNN achieves more than one order of magnitude smaller memory and compute complexity compared to deep learning approaches, while reaching a top gesture recognition accuracy of 93.9% using a 5-class thermal camera dataset acquired in a car cabin, within an automotive context. Our dataset is released for helping future research.
Paper Structure (11 sections, 8 equations, 4 figures, 3 tables)

This paper contains 11 sections, 8 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Thermal gesture sensing setup. A $24 \times 32$ pixel MLX90640 thermal camera is mounted in a car cabin and used to acquire data for the study of gesture recognition.
  • Figure 2: Monostable Multivibrator Neuron. a) MMV neuron behavior. b) Single MMV neuron connected to an input spiking vector $\Bar{\sigma}$ via the OR-ing binary weights.
  • Figure 3: MMV - R-PCA System Overview a) Pre-processing Eq. \ref{['scale']}-\ref{['spike_convert']} b) Gesture presence detection c) Gesture classification.
  • Figure 4: Examples of gestures acquired in our dataset.