SEED: Towards More Accurate Semantic Evaluation for Visual Brain Decoding
Juhyeon Park, Peter Yongho Kim, Jiook Cha, Shinjae Yoo, Taesup Moon
TL;DR
This work targets semantic evaluation for visual brain decoding, revealing that existing metrics poorly align with human judgments. It introduces SEED, a composite metric that blends Object F1, Cap-Sim, and a correlation-based EffNet term to better capture image semantics, and couples it with a novel pairwise hinge loss to improve semantic alignment during model training. Empirical results on NSD show SEED achieves the strongest agreement with human judgments and exposes frequent semantic near-misses in state-of-the-art reconstructions. The authors also open-source human ratings and provide a practical training loss, offering a pathway to more faithful semantic brain decoding and evaluation in future work.
Abstract
We present SEED (\textbf{Se}mantic \textbf{E}valuation for Visual Brain \textbf{D}ecoding), a novel metric for evaluating the semantic decoding performance of visual brain decoding models. It integrates three complementary metrics, each capturing a different aspect of semantic similarity between images. Using carefully crowd-sourced human judgment data, we demonstrate that SEED achieves the highest alignment with human evaluations, outperforming other widely used metrics. Through the evaluation of existing visual brain decoding models, we further reveal that crucial information is often lost in translation, even in state-of-the-art models that achieve near-perfect scores on existing metrics. To facilitate further research, we open-source the human judgment data, encouraging the development of more advanced evaluation methods for brain decoding models. Additionally, we propose a novel loss function designed to enhance semantic decoding performance by leveraging the order of pairwise cosine similarity in CLIP image embeddings. This loss function is compatible with various existing methods and has been shown to consistently improve their semantic decoding performances when used for training, with respect to both existing metrics and SEED.
