EVOKE: Emotion Enabled Virtual Avatar Mapping Using Optimized Knowledge Distillation
Maryam Nadeem, Raza Imam, Rouqaiah Al-Refai, Meriem Chkir, Mohamad Hoda, Abdulmotaleb El Saddik
TL;DR
EVOKE tackles EEG-based emotion recognition for emotionally expressive virtual avatars in resource-limited settings. It introduces a knowledge-distillation pipeline where a heavy CCNN teacher guides a compact two-convolution student to perform multi-label classification of valence, arousal, and dominance, achieving $18$-fold parameter reduction while attaining about $87\%$ accuracy on the DEAP dataset. The approach uses differential-entropy features arranged into $4\times9\times9$ grids and maps the resulting outputs to eight discrete emotions for avatar control, with a temperature parameter $T$ and weight $\alpha$ tuned for best performance (e.g., $T=1.25$, $\alpha=0.25$). It achieves fast inference times (as low as $0.33$ ms) and high throughput ($8.0176\times10^{4}$ samples/s) on a single RTX A6000, enabling real-time deployment in virtual environments and potential applications in healthcare and therapy.
Abstract
As virtual environments continue to advance, the demand for immersive and emotionally engaging experiences has grown. Addressing this demand, we introduce Emotion enabled Virtual avatar mapping using Optimized KnowledgE distillation (EVOKE), a lightweight emotion recognition framework designed for the seamless integration of emotion recognition into 3D avatars within virtual environments. Our approach leverages knowledge distillation involving multi-label classification on the publicly available DEAP dataset, which covers valence, arousal, and dominance as primary emotional classes. Remarkably, our distilled model, a CNN with only two convolutional layers and 18 times fewer parameters than the teacher model, achieves competitive results, boasting an accuracy of 87% while demanding far less computational resources. This equilibrium between performance and deployability positions our framework as an ideal choice for virtual environment systems. Furthermore, the multi-label classification outcomes are utilized to map emotions onto custom-designed 3D avatars.
