Rank-O-ToM: Unlocking Emotional Nuance Ranking to Enhance Affective Theory-of-Mind
JiHyun Kim, JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, YoungBin Kim
TL;DR
Rank-O-ToM addresses the challenge of calibrating AI models to interpret nuanced emotional states for affective ToM. It introduces synthetic sample blending via an adapted horizontal CutMix and a ranking-based loss that enforces higher confidence on clearer emotions and lower confidence on blended ones, formalized as $\mathcal{L}_{rank} = \max(0, \max_{c_1} p^{\mathrm{syn}}_{c_1} - \max_{c_1} p^{\mathrm{fer}}_{c_1} + \delta) + \max(0, \max_{c_2} p^{\mathrm{syn}}_{c_2} - \max_{c_2} p^{\mathrm{fr}}_{c_2} + \delta)$. The model also leverages pseudo-labeling of unlabeled FR data with adaptive class-wise thresholds to broaden training diversity. Empirical results on RAF-DB, FERPlus, and AffectNet show improved accuracy and confidence calibration, with qualitative CAM analyses indicating more comprehensive facial region attention and better alignment with compound emotions. Overall, Rank-O-ToM enhances affective ToM by capturing emotional intensity and complexity, enabling more nuanced and trustworthy emotion-aware AI interactions.
Abstract
Facial Expression Recognition (FER) plays a foundational role in enabling AI systems to interpret emotional nuances, a critical aspect of affective Theory of Mind (ToM). However, existing models often struggle with poor calibration and a limited capacity to capture emotional intensity and complexity. To address this, we propose Ranking the Emotional Nuance for Theory of Mind (Rank-O-ToM), a framework that leverages ordinal ranking to align confidence levels with the emotional spectrum. By incorporating synthetic samples reflecting diverse affective complexities, Rank-O-ToM enhances the nuanced understanding of emotions, advancing AI's ability to reason about affective states.
