Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli

Liwei Huang; Zhengyu Ma; Liutao Yu; Huihui Zhou; Yonghong Tian

Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli

Liwei Huang, Zhengyu Ma, Liutao Yu, Huihui Zhou, Yonghong Tian

TL;DR

This work proposes the long-range feedback spiking network (LoRaFB-SNet), which mimics top-down connections between cortical regions and incorporates spike information processing mechanisms inherent to biological neurons, and exhibits the highest level of representational similarity.

Abstract

Deep neural networks (DNNs) are widely used models for investigating biological visual representations. However, existing DNNs are mostly designed to analyze neural responses to static images, relying on feedforward structures and lacking physiological neuronal mechanisms. There is limited insight into how the visual cortex represents natural movie stimuli that contain context-rich information. To address these problems, this work proposes the long-range feedback spiking network (LoRaFB-SNet), which mimics top-down connections between cortical regions and incorporates spike information processing mechanisms inherent to biological neurons. Taking into account the temporal dependence of representations under movie stimuli, we present Time-Series Representational Similarity Analysis (TSRSA) to measure the similarity between model representations and visual cortical representations of mice. LoRaFB-SNet exhibits the highest level of representational similarity, outperforming other well-known and leading alternatives across various experimental paradigms, especially when representing long movie stimuli. We further conduct experiments to quantify how temporal structures (dynamic information) and static textures (static information) of the movie stimuli influence representational similarity, suggesting that our model benefits from long-range feedback to encode context-dependent representations just like the brain. Altogether, LoRaFB-SNet is highly competent in capturing both dynamic and static representations of the mouse visual cortex and contributes to the understanding of movie processing mechanisms of the visual system. Our codes are available at https://github.com/Grasshlw/SNN-Neural-Similarity-Movie.

Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli

TL;DR

Abstract

Paper Structure (32 sections, 2 equations, 7 figures, 6 tables)

This paper contains 32 sections, 2 equations, 7 figures, 6 tables.

Introduction
Related Work
The deep recurrent network models
The deep spiking network models
Methods
Long-Range Feedback Spiking Network
Architecture
Pre-Training and Representation Extraction
Representational Similarity Metric
Quantifying Effects of Dynamic and Static Information on Representational Similarity
Dynamic information
Static information
Experiments
Neural Dataset
Models for Comparisons
...and 17 more sections

Figures (7)

Figure 1: The overview of our experiments. Six visual cortical regions of the mouse and the long-range feedback spiking network receive the same original movie stimuli to generate the representation matrices. TSRSA is applied to two representation matrices to measure representational similarity. In addition, the network receives two modified versions of the movie stimuli (one with broken temporal structures and the other with varied static textures), while the visual cortex still receives the original movie. These two additional experiments are used to quantify the effects of dynamic (temporal) and static (textural) information on representational similarity. See Section \ref{['sec.methods']} for details.
Figure 2: A. The schematic of six visual cortical regions in the mouse. For brevity, we show parts of the cross-regional feedforward and feedback connections reported from physiological research. B. The schematic of LoRaFB-SNet with the embedded recurrent module. See Section \ref{['sec.model']} for details.
Figure 3: A. The TSRSA scores of three models pre-trained on UCF101. B. The TSRSA scores of feedback/feedforward spiking networks pre-trained on UCF101/ImageNet. C. The TSRSA score curves of three models pre-trained on UCF101 for different movie clip lengths. We randomly select continuous movie clips of different lengths and plot TSRSA scores between models' and the visual cortex's representations corresponding to these clips. The error bar is the standard error over 10 random seeds. D. The ratios of our model's scores to those of the alternative models for different clip lengths. The ratio tends to increase for longer clips.
Figure 4: A. The TSRSA score curves of LoRaFB-SNet and SEW-ResNet trained on UCF101 with different levels of chaos (the main plot) and the drop rate curves of experimental scores compared with the original score (the subplot). The horizontal coordinates in both plots are the level of chaos. In the main plot, the dashed horizontal lines indicate the original scores between models and the mouse visual cortex under the original movie. Each large point on the curve indicates the average result of a set of experiments, and each small point indicates the result of one trial in a set. The vertical error bar is the $99\%$ confidence interval of the score over 10 trials, while the horizontal error bar is the $99\%$ confidence interval of the level of chaos. In the subplot, the curves show the average drop rate and the average level of chaos over 10 trials for all experimental sets. LoRaFB-SNet shows a large drop in scores while SEW-ResNet shows a small drop. B. The TSRSA score curves of LoRaFB-SNet trained on UCF101/ImageNet with different ratios of replacement. The elements in the main plot indicate similar content as in A. LoRaFB-SNet trained on UCF101 and ImageNet both exhibit a similar decreasing trend in scores. C. The TSRSA scores of feedback/feedforward spiking networks trained on UCF101/ImageNet for the neural dataset under natural scene stimuli.
Figure 5: Detailed structure of the feedforward module. ${\rm CONV}$ is convolution. ${\rm BN}$ is batch normalization. ${\rm SN}$ is spiking neurons. ${\textit{f}}$ denotes an element-wise operation with two spike features.
...and 2 more figures

Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli

TL;DR

Abstract

Long-Range Feedback Spiking Network Captures Dynamic and Static Representations of the Visual Cortex under Movie Stimuli

Authors

TL;DR

Abstract

Table of Contents

Figures (7)