evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition
Rodrigo Verschae, Ignacio Bugueno-Cordova
TL;DR
This work tackles event-based facial expression recognition by addressing data sparsity and temporal dynamics with a transfer-learning approach. It introduces Temporal Information of Events (TIE), a reconstruction-driven encoder transfer from facial frame reconstruction, and an LSTM-based temporal module to capture long-term expressions. On synthetic e-CK+ and real NEFER data, evTransFER significantly outperforms state-of-the-art event-based FER methods, achieving up to 93.6% top-1 accuracy on CK+ and ~76% on NEFER, with notable gains from reconstruction-based pretraining and fine-tuning. The approach enables near real-time inference and suggests broader applicability of reconstruction-informed encoders to other datasets and object recognition tasks in neuromorphic vision.
Abstract
Event-based cameras are bio-inspired sensors that asynchronously capture pixel intensity changes with microsecond latency, high temporal resolution, and high dynamic range, providing information on the spatiotemporal dynamics of a scene. We propose evTransFER, a transfer learning-based framework for facial expression recognition using event-based cameras. The main contribution is a feature extractor designed to encode facial spatiotemporal dynamics, built by training an adversarial generative method on facial reconstruction and transferring the encoder weights to the facial expression recognition system. We demonstrate that the proposed transfer learning method improves facial expression recognition compared to training a network from scratch. We propose an architecture that incorporates an LSTM to capture longer-term facial expression dynamics and introduces a new event-based representation called TIE. We evaluated the framework using both the synthetic event-based facial expression database e-CK+ and the real neuromorphic dataset NEFER. On e-CK+, evTransFER achieved a recognition rate of 93.6\%, surpassing state-of-the-art methods. For NEFER, which comprises event sequence with real sensor noise and sparse activity, the proposed transfer learning strategy achieved an accuracy of up to 76.7\%. In both datasets, the outcomes surpassed current methodologies and exceeded results when compared with models trained from scratch.
