Table of Contents
Fetching ...

Cross-Modal Alignment between Visual Stimuli and Neural Responses in the Visual Cortex

Xing Gao, Dazhong Rong, Qinming He

TL;DR

This work addresses the challenge of mapping visual stimuli to neural responses in the visual cortex under neural variability and recording limitations. It shifts from direct encoding/decoding to discriminative tasks and introduces Visual-Neural Alignment (VNA), which learns a shared latent space for visual and neural representations via contrastive learning. Across three invasive datasets in mice and macaques, VNA consistently outperforms direct encoding and decoding in discriminative encoding and decoding tasks, demonstrating improved robustness and generalization. The findings highlight the value of cross-modal alignment for robustly characterizing visual-neural mappings and suggest future extensions to temporally dynamic stimuli.

Abstract

Investigating the mapping between visual stimuli and neural responses in the visual cortex contributes to a deeper understanding of biological visual processing mechanisms. Most existing studies characterize this mapping by training models to directly encode visual stimuli into neural responses or decode neural responses into visual stimuli. However, due to neural response variability and limited neural recording techniques, these studies suffer from overfitting and lack generalizability. Motivated by this challenge, in this paper we shift the tasks from conventional direct encoding and decoding to discriminative encoding and decoding, which are more reasonable. And on top of this we propose a cross-modal alignment approach, named Visual-Neural Alignment (VNA). To thoroughly test the performance of the three methods (direct encoding, direct decoding, and our proposed VNA) on discriminative encoding and decoding tasks, we conduct extensive experiments on three invasive visual cortex datasets, involving two types of subject mammals (mice and macaques). The results demonstrate that our VNA generally outperforms direct encoding and direct decoding, indicating our VNA can most precisely characterize the above visual-neural mapping among the three methods.

Cross-Modal Alignment between Visual Stimuli and Neural Responses in the Visual Cortex

TL;DR

This work addresses the challenge of mapping visual stimuli to neural responses in the visual cortex under neural variability and recording limitations. It shifts from direct encoding/decoding to discriminative tasks and introduces Visual-Neural Alignment (VNA), which learns a shared latent space for visual and neural representations via contrastive learning. Across three invasive datasets in mice and macaques, VNA consistently outperforms direct encoding and decoding in discriminative encoding and decoding tasks, demonstrating improved robustness and generalization. The findings highlight the value of cross-modal alignment for robustly characterizing visual-neural mappings and suggest future extensions to temporally dynamic stimuli.

Abstract

Investigating the mapping between visual stimuli and neural responses in the visual cortex contributes to a deeper understanding of biological visual processing mechanisms. Most existing studies characterize this mapping by training models to directly encode visual stimuli into neural responses or decode neural responses into visual stimuli. However, due to neural response variability and limited neural recording techniques, these studies suffer from overfitting and lack generalizability. Motivated by this challenge, in this paper we shift the tasks from conventional direct encoding and decoding to discriminative encoding and decoding, which are more reasonable. And on top of this we propose a cross-modal alignment approach, named Visual-Neural Alignment (VNA). To thoroughly test the performance of the three methods (direct encoding, direct decoding, and our proposed VNA) on discriminative encoding and decoding tasks, we conduct extensive experiments on three invasive visual cortex datasets, involving two types of subject mammals (mice and macaques). The results demonstrate that our VNA generally outperforms direct encoding and direct decoding, indicating our VNA can most precisely characterize the above visual-neural mapping among the three methods.

Paper Structure

This paper contains 24 sections, 1 equation, 1 table.