Table of Contents
Fetching ...

AVM: Towards Structure-Preserving Neural Response Modeling in the Visual Cortex Across Stimuli and Individuals

Qi Xu, Shuai Gong, Xuming Ran, Haihua Luo, Yangfan Hu

TL;DR

The paper tackles the brittleness of neural response models under stimulus and subject variability by proposing AVM, a structure-function decoupled framework that freezes a Vision Transformer backbone while introducing modular, condition-aware modulation units. This design enables localized, interpretable adaptation without retraining core representations. Across stimulus changes, cross-subject transfer, and cross-dataset adaptation on two large-scale mouse V1 datasets, AVM achieves superior predictive accuracy with significantly fewer trainable parameters, demonstrating robust generalization and efficiency. The work offers a biologically inspired approach to cortical modeling with implications for neuroscience and biologically grounded AI systems.

Abstract

While deep learning models have shown strong performance in simulating neural responses, they often fail to clearly separate stable visual encoding from condition-specific adaptation, which limits their ability to generalize across stimuli and individuals. We introduce the Adaptive Visual Model (AVM), a structure-preserving framework that enables condition-aware adaptation through modular subnetworks, without modifying the core representation. AVM keeps a Vision Transformer-based encoder frozen to capture consistent visual features, while independently trained modulation paths account for neural response variations driven by stimulus content and subject identity. We evaluate AVM in three experimental settings, including stimulus-level variation, cross-subject generalization, and cross-dataset adaptation, all of which involve structured changes in inputs and individuals. Across two large-scale mouse V1 datasets, AVM outperforms the state-of-the-art V1T model by approximately 2% in predictive correlation, demonstrating robust generalization, interpretable condition-wise modulation, and high architectural efficiency. Specifically, AVM achieves a 9.1% improvement in explained variance (FEVE) under the cross-dataset adaptation setting. These results suggest that AVM provides a unified framework for adaptive neural modeling across biological and experimental conditions, offering a scalable solution under structural constraints. Its design may inform future approaches to cortical modeling in both neuroscience and biologically inspired AI systems.

AVM: Towards Structure-Preserving Neural Response Modeling in the Visual Cortex Across Stimuli and Individuals

TL;DR

The paper tackles the brittleness of neural response models under stimulus and subject variability by proposing AVM, a structure-function decoupled framework that freezes a Vision Transformer backbone while introducing modular, condition-aware modulation units. This design enables localized, interpretable adaptation without retraining core representations. Across stimulus changes, cross-subject transfer, and cross-dataset adaptation on two large-scale mouse V1 datasets, AVM achieves superior predictive accuracy with significantly fewer trainable parameters, demonstrating robust generalization and efficiency. The work offers a biologically inspired approach to cortical modeling with implications for neuroscience and biologically grounded AI systems.

Abstract

While deep learning models have shown strong performance in simulating neural responses, they often fail to clearly separate stable visual encoding from condition-specific adaptation, which limits their ability to generalize across stimuli and individuals. We introduce the Adaptive Visual Model (AVM), a structure-preserving framework that enables condition-aware adaptation through modular subnetworks, without modifying the core representation. AVM keeps a Vision Transformer-based encoder frozen to capture consistent visual features, while independently trained modulation paths account for neural response variations driven by stimulus content and subject identity. We evaluate AVM in three experimental settings, including stimulus-level variation, cross-subject generalization, and cross-dataset adaptation, all of which involve structured changes in inputs and individuals. Across two large-scale mouse V1 datasets, AVM outperforms the state-of-the-art V1T model by approximately 2% in predictive correlation, demonstrating robust generalization, interpretable condition-wise modulation, and high architectural efficiency. Specifically, AVM achieves a 9.1% improvement in explained variance (FEVE) under the cross-dataset adaptation setting. These results suggest that AVM provides a unified framework for adaptive neural modeling across biological and experimental conditions, offering a scalable solution under structural constraints. Its design may inform future approaches to cortical modeling in both neuroscience and biologically inspired AI systems.

Paper Structure

This paper contains 23 sections, 8 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: AVM model architecture and condition-specific modulation variants. (A) The main network encodes stable visual representations under a consistent architecture. (B) Condition-aware Modulation Unit(CAMU): A lightweight feedforward module with a bottleneck structure, serving as the basic modulation component across AVM variants. (C) Condition-Specific Modulation Variants: AVM Employs block-specific modulation paths for localized response adaptation. AVM-S Shares a single modulation path across all Transformer blocks, enabling parameter-efficient tuning. AVM-B Introduces Cross-CAMU to support cross-block and cross-condition transfer, modeling higher-level adaptation interactions.
  • Figure 2: AVM consistently improves individual-level neural prediction. Evaluation results on Dataset-F for each individual mouse (F–O), comparing AVM and baseline V1T. Three metrics are reported: single-trial correlation (top), trial-averaged correlation (middle), and fraction of explained variance (FEVE, bottom). AVM achieves consistent gains across all individuals.
  • Figure 3: The tuning ability of the AVM model under different input conditions. The top three figures show the results for dataset S, and the bottom three figures show the results for dataset F. From left to right, these figures show the single-trial correlation, average correlation, and explained variance, respectively. Each figure includes four structures: V1T-D, V1T-T, AVM-S, and AVM. The x-axis represents each mouse, and the y-axis represents the predicted value.
  • Figure 4: The number of trainable parameters. Comparison of the number of trainable parameters of our proposed AVM core and V1T core.
  • Figure 5: Visualization of the CAMU submodules (CAMU1, CAMU2, and CAMU3): These submodules represent different stages of modulation within the AVM framework. Each plot shows the weight distribution of the respective CAMU block, with varying levels of modulation strength across different blocks (Block 0 to Block 3). CAMU1, CAMU2, and CAMU3 highlight the diverse responses and weight adjustments as the model adapts to different stimuli and conditions.