Table of Contents
Fetching ...

NeuroGaze-Distill: Brain-informed Distillation and Depression-Inspired Geometric Priors for Robust Facial Emotion Recognition

Zilin Li, Weiwei Xu, Xuanqi Zhao, Yiran Zhu

TL;DR

NeuroGaze-Distill tackles cross-dataset facial emotion recognition by translating brain-informed priors into a vision-only FER model. A frozen EEG-derived Valence/Arousal prototype bank, formed from a teacher trained on EEG topomaps, guides a standard ResNet student via Proto--KD, while a light depression-inspired D--Geo prior regularizes the embedding geometry to improve robustness. The method combines CE with label smoothing, logit KD, and two regularizers, delivering consistent gains in Macro-F1 and balanced accuracy on FERPlus and cross-dataset settings such as AffectNet-mini and CK+. Results show early-learning benefits from KD and stable late-epoch improvements from Proto--KD and D--Geo, with a 5×5 prototype grid yielding better stability than denser grids. The framework remains deployable (vision-only at inference) and emphasizes reproducibility through frozen prototypes and artifact-based verification, while cautioning against clinical interpretations and highlighting avenues for adaptive prototypes and broader fairness analyses.

Abstract

Facial emotion recognition (FER) models trained only on pixels often fail to generalize across datasets because facial appearance is an indirect and biased proxy for underlying affect. We present NeuroGaze-Distill, a cross-modal distillation framework that transfers brain-informed priors into an image-only FER student via static Valence/Arousal (V/A) prototypes and a depression-inspired geometric prior (D-Geo). A teacher trained on EEG topographic maps from DREAMER (with MAHNOB-HCI as unlabeled support) produces a consolidated 5x5 V/A prototype grid that is frozen and reused; no EEG-face pairing and no non-visual signals at deployment are required. The student (ResNet-18/50) is trained on FERPlus with conventional CE/KD and two lightweight regularizers: (i) Proto-KD (cosine) aligns student features to the static prototypes; (ii) D-Geo softly shapes the embedding geometry in line with affective findings often reported in depression research (e.g., anhedonia-like contraction in high-valence regions). We evaluate both within-domain (FERPlus validation) and cross-dataset protocols (AffectNet-mini; optional CK+), reporting standard 8-way scores alongside present-only Macro-F1 and balanced accuracy to fairly handle label-set mismatch. Ablations attribute consistent gains to prototypes and D-Geo, and favor 5x5 over denser grids for stability. The method is simple, deployable, and improves robustness without architectural complexity.

NeuroGaze-Distill: Brain-informed Distillation and Depression-Inspired Geometric Priors for Robust Facial Emotion Recognition

TL;DR

NeuroGaze-Distill tackles cross-dataset facial emotion recognition by translating brain-informed priors into a vision-only FER model. A frozen EEG-derived Valence/Arousal prototype bank, formed from a teacher trained on EEG topomaps, guides a standard ResNet student via Proto--KD, while a light depression-inspired D--Geo prior regularizes the embedding geometry to improve robustness. The method combines CE with label smoothing, logit KD, and two regularizers, delivering consistent gains in Macro-F1 and balanced accuracy on FERPlus and cross-dataset settings such as AffectNet-mini and CK+. Results show early-learning benefits from KD and stable late-epoch improvements from Proto--KD and D--Geo, with a 5×5 prototype grid yielding better stability than denser grids. The framework remains deployable (vision-only at inference) and emphasizes reproducibility through frozen prototypes and artifact-based verification, while cautioning against clinical interpretations and highlighting avenues for adaptive prototypes and broader fairness analyses.

Abstract

Facial emotion recognition (FER) models trained only on pixels often fail to generalize across datasets because facial appearance is an indirect and biased proxy for underlying affect. We present NeuroGaze-Distill, a cross-modal distillation framework that transfers brain-informed priors into an image-only FER student via static Valence/Arousal (V/A) prototypes and a depression-inspired geometric prior (D-Geo). A teacher trained on EEG topographic maps from DREAMER (with MAHNOB-HCI as unlabeled support) produces a consolidated 5x5 V/A prototype grid that is frozen and reused; no EEG-face pairing and no non-visual signals at deployment are required. The student (ResNet-18/50) is trained on FERPlus with conventional CE/KD and two lightweight regularizers: (i) Proto-KD (cosine) aligns student features to the static prototypes; (ii) D-Geo softly shapes the embedding geometry in line with affective findings often reported in depression research (e.g., anhedonia-like contraction in high-valence regions). We evaluate both within-domain (FERPlus validation) and cross-dataset protocols (AffectNet-mini; optional CK+), reporting standard 8-way scores alongside present-only Macro-F1 and balanced accuracy to fairly handle label-set mismatch. Ablations attribute consistent gains to prototypes and D-Geo, and favor 5x5 over denser grids for stability. The method is simple, deployable, and improves robustness without architectural complexity.

Paper Structure

This paper contains 58 sections, 2 equations, 7 figures, 3 tables, 2 algorithms.

Figures (7)

  • Figure 1: NeuroGaze--Distill overview.(A) FER suffers from distribution shift; robust depression–related affect detection is challenging with pixels alone. (B) EEG preprocessing: spectral features are rendered as topographic images. (C) Teacher data: EEG topomaps from DREAMER and MAHNOB--HCI. (D) V/A circumplex with 5$\times$5 binning; no paired EEG--face samples required. (E) Distillation: ResNet--18/50 student with CE (LS=0.055, CW), KD on logits, Proto--KD (cosine, $\tau\!\approx\!0.90$) toward static prototypes, and a light D--Geo prior (depression--inspired). (F) Evaluation: within--domain on FERPlus and cross--dataset on AffectNet--mini (optional CK+) with present--only metrics.
  • Figure 2: Prototype coverage ($5\times 5$, v4).Left: counts per V/A bin; Right: grid centers with marker size $\propto$ counts. Panels are height-matched and width-balanced (no cropping).
  • Figure 3: EEG topomap grids used by the teacher (Sec. \ref{['sec:data']}).Left: DREAMER band-power topomaps katsigiannis2018dreamer. Right: MAHNOB--HCI in a cool-blue palette soleymani2012mahnob. Maps are per-band normalized for visualization and rendered with 10--20 style interpolation; they are non-identifiable. These grids train the EEG teacher that regresses V/A, after which validation embeddings are aggregated into a fixed $5{\times}5$ V/A prototype bank (Sec. \ref{['sec:data']}, Processing and binning). No gaze or other non-visual signals are used by the student.
  • Figure 4: Ablation timelines on FERPlus valid. Macro--F1 (left) and Accuracy (right) vs epochs for B0$\!\rightarrow\!$B3. KD (B1) speeds up early convergence, while Proto--KD (B2) and the late-activated D--Geo (B3) provide consistent late-epoch gains; B3 attains the best final Macro--F1/Acc.
  • Figure 5: Present-only confusion matrix on AffectNet-mini. Evaluated with the full method (B3: CE+KD+Proto--KD+D--Geo).
  • ...and 2 more figures