InsideOut: An EfficientNetV2-S Based Deep Learning Framework for Robust Multi-Class Facial Emotion Recognition

Ahsan Farabi; Israt Khandaker; Ibrahim Khalil Shanto; Md Abdul Ahad Minhaz; Tanisha Zaman

InsideOut: An EfficientNetV2-S Based Deep Learning Framework for Robust Multi-Class Facial Emotion Recognition

Ahsan Farabi, Israt Khandaker, Ibrahim Khalil Shanto, Md Abdul Ahad Minhaz, Tanisha Zaman

TL;DR

Facial emotion recognition remains challenging due to occlusion, illumination, pose variation, subtle intra-class differences, and dataset imbalance. The authors present InsideOut, a lightweight FER framework based on EfficientNetV2-S that leverages transfer learning, data augmentation, and an imbalance-aware training recipe to achieve robust multi-class recognition with reproducible evaluation. On FER2013, InsideOut attains $62.8\%$ accuracy and macro F1 $0.590$, with strong performance for majority classes and notable improvements for minority classes like Disgust and Surprise, while challenging expressions like Fear and Sadness persist. This work demonstrates that efficient architectures coupled with targeted imbalance handling can provide practical, transparent FER solutions suitable for real-time deployment and as a reproducible baseline for future research.

Abstract

Facial Emotion Recognition (FER) is a key task in affective computing, enabling applications in human-computer interaction, e-learning, healthcare, and safety systems. Despite advances in deep learning, FER remains challenging due to occlusions, illumination and pose variations, subtle intra-class differences, and dataset imbalance that hinders recognition of minority emotions. We present InsideOut, a reproducible FER framework built on EfficientNetV2-S with transfer learning, strong data augmentation, and imbalance-aware optimization. The approach standardizes FER2013 images, applies stratified splitting and augmentation, and fine-tunes a lightweight classification head with class-weighted loss to address skewed distributions. InsideOut achieves 62.8% accuracy with a macro averaged F1 of 0.590 on FER2013, showing competitive results compared to conventional CNN baselines. The novelty lies in demonstrating that efficient architectures, combined with tailored imbalance handling, can provide practical, transparent, and reproducible FER solutions.

InsideOut: An EfficientNetV2-S Based Deep Learning Framework for Robust Multi-Class Facial Emotion Recognition

TL;DR

Abstract

InsideOut: An EfficientNetV2-S Based Deep Learning Framework for Robust Multi-Class Facial Emotion Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)