Assessing the Alignment of Popular CNNs to the Brain for Valence Appraisal
Laurent Mertens, Elahe' Yargholi, Laura Van Hove, Hans Op de Beeck, Jan Van den Stock, Joost Vennekens
TL;DR
This work evaluates whether popular CNNs trained for image valence appraisal align with human behavior and brain activity, extending analyses from basic perception to social-cognitive valence processing. By combining fMRI/behavioral data with a broad set of architectures, it shows CNNs largely reflect low-level scene processing and struggle to capture higher-order valence judgments, especially for incongruent scenes. The authors introduce EmoCAM++ and Object2Brain to quantify how individual object classes influence brain-aligned predictions at the filter level, revealing architecture-specific sensitivities and a persistent bias toward scene elements. These findings highlight the need for more expressive models to faithfully mirror human valence appraisal and provide a framework for probing brain-imaging alignment at a fine-grained, object-class level.
Abstract
Convolutional Neural Networks (CNNs) are a popular type of computer model that have proven their worth in many computer vision tasks. Moreover, they form an interesting study object for the field of psychology, with shown correspondences between the workings of CNNs and the human brain. However, these correspondences have so far mostly been studied in the context of general visual perception. In contrast, this paper explores to what extent this correspondence also holds for a more complex brain process, namely social cognition. To this end, we assess the alignment between popular CNN architectures and both human behavioral and fMRI data for image valence appraisal through a correlation analysis. We show that for this task CNNs struggle to go beyond simple visual processing, and do not seem to reflect higher-order brain processing. Furthermore, we present Object2Brain, a novel framework that combines GradCAM and object detection at the CNN-filter level with the aforementioned correlation analysis to study the influence of different object classes on the CNN-to-human correlations. Despite similar correlation trends, different CNN architectures are shown to display different object class sensitivities.
