Table of Contents
Fetching ...

Biologically-Motivated Learning Model for Instructed Visual Processing

Roy Abel, Shimon Ullman

TL;DR

This work tackles how TD attention can be integrated with learning in visual processing by proposing a biologically plausible BU-TD model that uses Counter-Hebb learning to generate backpropagation-like updates. A task-driven instruction mechanism selects sparse BU sub-networks, enabling instruction-based guided vision within a unified BU-TD framework. The Counter-Hebb rule provides locality and, under weight symmetry, exact BP equivalence, with near-BP performance in asymmetric settings, and robust results on standard vision benchmarks and multi-task learning datasets. The findings offer a potential bridge between neuroscience-inspired models of vision and instruction-tuned AI systems, suggesting pathways for biologically plausible, guided vision models and insights relevant to vision-language architectures.

Abstract

As part of understanding how the brain learns, ongoing work seeks to combine biological knowledge and current artificial intelligence (AI) modeling in an attempt to find an efficient biologically plausible learning scheme. Current models of biologically plausible learning often use a cortical-like combination of bottom-up (BU) and top-down (TD) processing, where the TD part carries feedback signals used for learning. However, in the visual cortex, the TD pathway plays a second major role of visual attention, by guiding the visual process to locations and tasks of interest. A biological model should therefore combine the two tasks, and learn to guide the visual process. We introduce a model that uses a cortical-like combination of BU and TD processing that naturally integrates the two major functions of the TD stream. The integrated model is obtained by an appropriate connectivity pattern between the BU and TD streams, a novel processing cycle that uses the TD part twice, and the use of 'Counter-Hebb' learning that operates across the streams. We show that the 'Counter-Hebb' mechanism can provide an exact backpropagation synaptic modification. We further demonstrate the model's ability to guide the visual stream to perform a task of interest, achieving competitive performance compared with AI models on standard multi-task learning benchmarks. The successful combination of learning and visual guidance could provide a new view on combining BU and TD processing in human vision, and suggests possible directions for both biologically plausible models and artificial instructed models, such as vision-language models (VLMs).

Biologically-Motivated Learning Model for Instructed Visual Processing

TL;DR

This work tackles how TD attention can be integrated with learning in visual processing by proposing a biologically plausible BU-TD model that uses Counter-Hebb learning to generate backpropagation-like updates. A task-driven instruction mechanism selects sparse BU sub-networks, enabling instruction-based guided vision within a unified BU-TD framework. The Counter-Hebb rule provides locality and, under weight symmetry, exact BP equivalence, with near-BP performance in asymmetric settings, and robust results on standard vision benchmarks and multi-task learning datasets. The findings offer a potential bridge between neuroscience-inspired models of vision and instruction-tuned AI systems, suggesting pathways for biologically plausible, guided vision models and insights relevant to vision-language architectures.

Abstract

As part of understanding how the brain learns, ongoing work seeks to combine biological knowledge and current artificial intelligence (AI) modeling in an attempt to find an efficient biologically plausible learning scheme. Current models of biologically plausible learning often use a cortical-like combination of bottom-up (BU) and top-down (TD) processing, where the TD part carries feedback signals used for learning. However, in the visual cortex, the TD pathway plays a second major role of visual attention, by guiding the visual process to locations and tasks of interest. A biological model should therefore combine the two tasks, and learn to guide the visual process. We introduce a model that uses a cortical-like combination of BU and TD processing that naturally integrates the two major functions of the TD stream. The integrated model is obtained by an appropriate connectivity pattern between the BU and TD streams, a novel processing cycle that uses the TD part twice, and the use of 'Counter-Hebb' learning that operates across the streams. We show that the 'Counter-Hebb' mechanism can provide an exact backpropagation synaptic modification. We further demonstrate the model's ability to guide the visual stream to perform a task of interest, achieving competitive performance compared with AI models on standard multi-task learning benchmarks. The successful combination of learning and visual guidance could provide a new view on combining BU and TD processing in human vision, and suggests possible directions for both biologically plausible models and artificial instructed models, such as vision-language models (VLMs).
Paper Structure (30 sections, 14 equations, 20 figures, 4 tables, 2 algorithms)

This paper contains 30 sections, 14 equations, 20 figures, 4 tables, 2 algorithms.

Figures (20)

  • Figure 1: The Counter-Hebb update rule in comparison with the classical Hebb rule. The classical Hebbrule (on the left), with a focus on a single upstream synapse $W_{ij}$ (outlined by a circle), connecting a pre-synaptic neuron $a_j$ with a post-synaptic neuron $b_i$. The synapse $W_{ij}$ is updated based on the activity of both associated neurons $a_j$ and $b_i$. While neuron $a_j$ is directly associated with the synapse $W_{ij}$, neuron $b_i$ is assumed to transmit its information through propagation down the dendritic tree to synapse $W_{ij}$ (orange arrow). In contrast, the Counter-Hebb update rule, (on the right), relies on a contribution from the counterpart downstream (marked in orange), mediated via lateral connections. Compared with the Hebb rule, the signal from $a_j$ is combined with the signal from neuron $\bar{b}_i$ rather than neuron $b_i$. Notably, the resulting Counter-Hebb rule naturally applies an identical update to both $W_{i j}$ and its counter synapse $\bar{W}_{j i}$.
  • Figure 2: The instruction-based learning algorithm. The three columns represent three passes of our model (left to right): $TD \rightarrow BU \rightarrow TD$, where the first two passes provide a prediction output given an image and a task, and the last TD pass (in green frame) is used for learning. In inference, The BU visual process is guided by the TD network according to the given task. More specifically, The TD network propagates downward instruction signals followed by a guided BU process of the input image to compute predictions. By applying ReLU non-linearity, the input task selectively activates a subset of neurons (i.e. non-zero values), composing a sub-network within the full network. The BU network then processes an input image using a composition of ReLU and GaLU. The GaLU function (dashed arrows) gates the BU computation to operate only on the selected sub-network that was activated by the task. For learning, the same TD network is then reused to propagate prediction error signals with GaLU exclusively (no ReLU). Finally, the 'Counter-Hebb' learning rule adjusts both networks' weights based on the activation values of their neurons. Therefore, in contrast with standard models, the entire computation, including the learning, is carried out by neurons in the network, and no additional computation is used for learning (e.g. backpropagation)
  • Figure 3: The figure depicts the prediction and instruction heads of the BU-TD model. Each head consists of two parts: one for the BU network and the other for the TD network. These parts maintain the symmetric structure with lateral connectivity of the BU-TD model. The instruction head employs a 2-layer MLP, while the prediction head utilizes a single linear layer. Only one head can be active in each pass of the BU-TD model, enabling selection between the instruction head and the prediction head. These heads can be alternated, with a different head chosen in each pass. The prediction head is responsible for model predictions. In the BU stream, it generates predictions based on input data, while in the TD stream, it delivers prediction error information. On the other hand, the instruction head bridges the instructional space with visual concepts. The TD stream maps task representations into the model's hidden space, while the BU stream maps the visual space into the instructional space. Refer to Fig \ref{['fig: mtl algorithm']} for an illustration of how the two heads are utilized in learning instruction-based models.
  • Figure 4: MNIST results: comparing different weight decay values and presenting the mean performance including std per training epoch averaged across 5 runs.
  • Figure 5: MNIST results: comparing different weight decay values and presenting the mean performance including std per training epoch averaged across 5 runs. Focusing on less weight decay factors, and starting from the 4th iteration for better visualization of the differences
  • ...and 15 more figures