NeuroGaze-Distill: Brain-informed Distillation and Depression-Inspired Geometric Priors for Robust Facial Emotion Recognition
Zilin Li, Weiwei Xu, Xuanqi Zhao, Yiran Zhu
TL;DR
NeuroGaze-Distill tackles cross-dataset facial emotion recognition by translating brain-informed priors into a vision-only FER model. A frozen EEG-derived Valence/Arousal prototype bank, formed from a teacher trained on EEG topomaps, guides a standard ResNet student via Proto--KD, while a light depression-inspired D--Geo prior regularizes the embedding geometry to improve robustness. The method combines CE with label smoothing, logit KD, and two regularizers, delivering consistent gains in Macro-F1 and balanced accuracy on FERPlus and cross-dataset settings such as AffectNet-mini and CK+. Results show early-learning benefits from KD and stable late-epoch improvements from Proto--KD and D--Geo, with a 5×5 prototype grid yielding better stability than denser grids. The framework remains deployable (vision-only at inference) and emphasizes reproducibility through frozen prototypes and artifact-based verification, while cautioning against clinical interpretations and highlighting avenues for adaptive prototypes and broader fairness analyses.
Abstract
Facial emotion recognition (FER) models trained only on pixels often fail to generalize across datasets because facial appearance is an indirect and biased proxy for underlying affect. We present NeuroGaze-Distill, a cross-modal distillation framework that transfers brain-informed priors into an image-only FER student via static Valence/Arousal (V/A) prototypes and a depression-inspired geometric prior (D-Geo). A teacher trained on EEG topographic maps from DREAMER (with MAHNOB-HCI as unlabeled support) produces a consolidated 5x5 V/A prototype grid that is frozen and reused; no EEG-face pairing and no non-visual signals at deployment are required. The student (ResNet-18/50) is trained on FERPlus with conventional CE/KD and two lightweight regularizers: (i) Proto-KD (cosine) aligns student features to the static prototypes; (ii) D-Geo softly shapes the embedding geometry in line with affective findings often reported in depression research (e.g., anhedonia-like contraction in high-valence regions). We evaluate both within-domain (FERPlus validation) and cross-dataset protocols (AffectNet-mini; optional CK+), reporting standard 8-way scores alongside present-only Macro-F1 and balanced accuracy to fairly handle label-set mismatch. Ablations attribute consistent gains to prototypes and D-Geo, and favor 5x5 over denser grids for stability. The method is simple, deployable, and improves robustness without architectural complexity.
