Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions

Puneet Kumar; Balasubramanian Raman

Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions

Puneet Kumar, Balasubramanian Raman

TL;DR

The paper addresses Image Emotion Recognition (IER) by proposing a feature-based domain adaptation framework that transfers a Facial Expression Recognition (FER) model to generic images using a discrepancy loss while preserving the same network architecture. It introduces DnCShap, a Divide and Conquer SHAP-based interpretability method for pixel-level and layer-wise explanations, increasing transparency of emotion predictions. Evaluation on four diverse datasets (IAPSa, ArtPhoto, FI, EMOTIC) shows competitive accuracies (61.86%, 62.47%, 70.78%, 59.72%) and strong interpretability analyses, with ablations validating the contributions of both the domain adaptation and the interpretability components. The work delivers a practical, interpretable IER pipeline that generalizes across facial and non-facial content and lays groundwork for future multimodal extensions and broader emotion taxonomies.

Abstract

This paper proposes a feature-based domain adaptation technique for identifying emotions in generic images, encompassing both facial and non-facial objects, as well as non-human components. This approach addresses the challenge of the limited availability of pre-trained models and well-annotated datasets for Image Emotion Recognition (IER). Initially, a deep-learning-based Facial Expression Recognition (FER) system is developed, classifying facial images into discrete emotion classes. Maintaining the same network architecture, this FER system is then adapted to recognize emotions in generic images through the application of discrepancy loss, enabling the model to effectively learn IER features while classifying emotions into categories such as 'happy,' 'sad,' 'hate,' and 'anger.' Additionally, a novel interpretability method, Divide and Conquer based Shap (DnCShap), is introduced to elucidate the visual features most relevant for emotion recognition. The proposed IER system demonstrated emotion classification accuracies of 61.86% for the IAPSa dataset, 62.47 for the ArtPhoto dataset, 70.78% for the FI dataset, and 59.72% for the EMOTIC dataset. The system effectively identifies the important visual features that lead to specific emotion classifications and also provides detailed embedding plots explaining the predictions, enhancing the understanding and trust in AI-driven emotion recognition systems.

Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions

TL;DR

Abstract

Interpretable Image Emotion Recognition: A Domain Adaptation Approach Using Facial Expressions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)