Biologically-Motivated Learning Model for Instructed Visual Processing
Roy Abel, Shimon Ullman
TL;DR
This work tackles how TD attention can be integrated with learning in visual processing by proposing a biologically plausible BU-TD model that uses Counter-Hebb learning to generate backpropagation-like updates. A task-driven instruction mechanism selects sparse BU sub-networks, enabling instruction-based guided vision within a unified BU-TD framework. The Counter-Hebb rule provides locality and, under weight symmetry, exact BP equivalence, with near-BP performance in asymmetric settings, and robust results on standard vision benchmarks and multi-task learning datasets. The findings offer a potential bridge between neuroscience-inspired models of vision and instruction-tuned AI systems, suggesting pathways for biologically plausible, guided vision models and insights relevant to vision-language architectures.
Abstract
As part of understanding how the brain learns, ongoing work seeks to combine biological knowledge and current artificial intelligence (AI) modeling in an attempt to find an efficient biologically plausible learning scheme. Current models of biologically plausible learning often use a cortical-like combination of bottom-up (BU) and top-down (TD) processing, where the TD part carries feedback signals used for learning. However, in the visual cortex, the TD pathway plays a second major role of visual attention, by guiding the visual process to locations and tasks of interest. A biological model should therefore combine the two tasks, and learn to guide the visual process. We introduce a model that uses a cortical-like combination of BU and TD processing that naturally integrates the two major functions of the TD stream. The integrated model is obtained by an appropriate connectivity pattern between the BU and TD streams, a novel processing cycle that uses the TD part twice, and the use of 'Counter-Hebb' learning that operates across the streams. We show that the 'Counter-Hebb' mechanism can provide an exact backpropagation synaptic modification. We further demonstrate the model's ability to guide the visual stream to perform a task of interest, achieving competitive performance compared with AI models on standard multi-task learning benchmarks. The successful combination of learning and visual guidance could provide a new view on combining BU and TD processing in human vision, and suggests possible directions for both biologically plausible models and artificial instructed models, such as vision-language models (VLMs).
