Modeling Rapid Contextual Learning in the Visual Cortex with Fast-Weight Deep Autoencoder Networks
Yue Li, Weifan Wang, Tai Sing Lee
TL;DR
This paper investigates how rapid global contextual learning can emerge in early visual representations by combining a ViT-based autoencoder with LoRA-based fast weights. It demonstrates that familiarity training compresses nuisance variation and aligns early-layer representations with a global context stored in the top layer, with LoRA amplifying these effects and enriching self-attention with global context while preserving task flexibility. The results support a slow-core/fast-shell view of cortical-like plasticity, showing that fast, low-rank adapters can model rapid, context-dependent updates in hierarchical networks. These findings have implications for understanding brain-like rapid adaptation and for designing adaptive, context-aware vision systems.
Abstract
Recent neurophysiological studies have revealed that the early visual cortex can rapidly learn global image context, as evidenced by a sparsification of population responses and a reduction in mean activity when exposed to familiar versus novel image contexts. This phenomenon has been attributed primarily to local recurrent interactions, rather than changes in feedforward or feedback pathways, supported by both empirical findings and circuit-level modeling. Recurrent neural circuits capable of simulating these effects have been shown to reshape the geometry of neural manifolds, enhancing robustness and invariance to irrelevant variations. In this study, we employ a Vision Transformer (ViT)-based autoencoder to investigate, from a functional perspective, how familiarity training can induce sensitivity to global context in the early layers of a deep neural network. We hypothesize that rapid learning operates via fast weights, which encode transient or short-term memory traces, and we explore the use of Low-Rank Adaptation (LoRA) to implement such fast weights within each Transformer layer. Our results show that (1) The proposed ViT-based autoencoder's self-attention circuit performs a manifold transform similar to a neural circuit model of the familiarity effect. (2) Familiarity training aligns latent representations in early layers with those in the top layer that contains global context information. (3) Familiarity training broadens the self-attention scope within the remembered image context. (4) These effects are significantly amplified by LoRA-based fast weights. Together, these findings suggest that familiarity training introduces global sensitivity to earlier layers in a hierarchical network, and that a hybrid fast-and-slow weight architecture may provide a viable computational model for studying rapid global context learning in the brain.
