Table of Contents
Fetching ...

Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing

Khubaib Ahmed, Ahsan Adeel, Mario Franco, Mohsin Raza

TL;DR

It is demonstrated that a deep network composed of such local processors seeks to maximise agreement between the active neurons, thus restricting the transmission of conflicting information to higher levels and reducing the neural activity required to process large amounts of heterogeneous real-world data.

Abstract

Deep learning (DL) has big-data processing capabilities that are as good, or even better, than those of humans in many real-world domains, but at the cost of high energy requirements that may be unsustainable in some applications and of errors, that, though infrequent, can be large. We hypothesise that a fundamental weakness of DL lies in its intrinsic dependence on integrate-and-fire point neurons that maximise information transmission irrespective of whether it is relevant in the current context or not. This leads to unnecessary neural firing and to the feedforward transmission of conflicting messages, which makes learning difficult and processing energy inefficient. Here we show how to circumvent these limitations by mimicking the capabilities of context-sensitive neocortical neurons that receive input from diverse sources as a context to amplify and attenuate the transmission of relevant and irrelevant information, respectively. We demonstrate that a deep network composed of such local processors seeks to maximise agreement between the active neurons, thus restricting the transmission of conflicting information to higher levels and reducing the neural activity required to process large amounts of heterogeneous real-world data. As shown to be far more effective and efficient than current forms of DL, this two-point neuron study offers a possible step-change in transforming the cellular foundations of deep network architectures.

Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing

TL;DR

It is demonstrated that a deep network composed of such local processors seeks to maximise agreement between the active neurons, thus restricting the transmission of conflicting information to higher levels and reducing the neural activity required to process large amounts of heterogeneous real-world data.

Abstract

Deep learning (DL) has big-data processing capabilities that are as good, or even better, than those of humans in many real-world domains, but at the cost of high energy requirements that may be unsustainable in some applications and of errors, that, though infrequent, can be large. We hypothesise that a fundamental weakness of DL lies in its intrinsic dependence on integrate-and-fire point neurons that maximise information transmission irrespective of whether it is relevant in the current context or not. This leads to unnecessary neural firing and to the feedforward transmission of conflicting messages, which makes learning difficult and processing energy inefficient. Here we show how to circumvent these limitations by mimicking the capabilities of context-sensitive neocortical neurons that receive input from diverse sources as a context to amplify and attenuate the transmission of relevant and irrelevant information, respectively. We demonstrate that a deep network composed of such local processors seeks to maximise agreement between the active neurons, thus restricting the transmission of conflicting information to higher levels and reducing the neural activity required to process large amounts of heterogeneous real-world data. As shown to be far more effective and efficient than current forms of DL, this two-point neuron study offers a possible step-change in transforming the cellular foundations of deep network architectures.
Paper Structure (11 equations, 15 figures, 3 tables)

This paper contains 11 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Context-sensitive neocortical neuron whose apical dendrites are in layer 1 (L1) with cell body and basal dendrites in deeper layers. The apical tuft receives input from diverse sources as context to amplify the transmission of coherent feedforward signals. However, to make this mechanism process large-scale complex real-world data effectively and efficiently, it is crucial to understand different kinds of information that arrive at the apical tuft and how they influence the cell’s response to the feedforward input.
  • Figure 2: Context-sensitive two-point neuron inspired-cooperative context-sensitive neural information processing applied to audiovisual speech denoising in a challenging multi-talker environment.(A) Schematic diagram of the two-point neuron-inspired cooperative context-sensitive auditory processor that receives input from diverse sources as a context to amplify and attenuate the transmission of relevant and irrelevant FF information received at the basal, respectively. The processor receives three different kinds of context, proximal (P), distal (D), and universal (U): P and D represent information from the neighbouring auditory processors and distal visual processors, respectively, and U represents cross-modal memory (Figure S1 (A-B)). However, U could explicitly be extended to the sources of inputs to include prior experiences, emotional states, and cognitive load. The integrated context (C) via modulatory function amplifies and suppresses the transmission of relevant and irrelevant speech signals heard in noisy environments, where $\alpha$, $\beta$, and $\gamma$ are the weights associated with P, D, and U, respectively. (B) The modulatory function uses C as a driving force to split the signal into relevant and irrelevant signals. It amplifies the output when C is high and suppresses the output when C is low. The Rectified Linear Unit (ReLU) discards the suppressed information (below zero). The context-sensitive deep information processing architecture composed of context-sensitive processors (Figure S1 (C)), turns off up to 99% of units carrying irrelevant information. As opposed to Infomax, which maximises the transmission of information irrespective of whether or not it is relevant in the current context, the proposed approach maximises the transmission of information that is relevant in the current context. This distinction is at the core of the proposed approach and is not just sparse coding.
  • Figure 3: Context-sensitive processors can efficiently process large amounts of heterogeneous real-world AV data.(A) Selective information processing: the blue line shows that context-sensitive processors quickly evolve to become highly sensitive to relevant information and become active only when the received information is important for the task at hand. In contrast, point processors-driven baseline model and $\beta-$variational autoencoder (VAE) with and without energy term (E) in the cost function experience significantly higher neural activity. (B) Mutual information (MI) estimation and maximization between high dimensional clean visual and noisy speech signal. Note that the context-sensitive processors-driven deep model converges quickly to the higher MI. The negative MI is due to untrained random weights at the start of the neural net training. Solid and dashed lines indicate testing loss and training phases, respectively. (C) To test the system against true MI, the network is used to estimate and maximize MI between multivariate Gaussian Random Variables. It can be observed that context-sensitive processors quickly converge to the true MI compared to other sophisticated point-processor driven methods, including MI neural estimation with f and Kullback–Leibler (KL) divergence belghazi2018mutual. (D) Resilience test: when trained models were tested for resilience with 35% randomly killed processors, context-sensitive processors degraded performance gracefully as compared to point processors. (E-F) AV speech reconstruction error and speech mask estimation: the context-sensitive processors-driven deep model achieves comparable results with faster learning at the early training stage despite using significantly less number of processors at any moment.
  • Figure 4: Context-sensitive local processors transmit only relevant information:(A) Feature maps: the Y-axis represents the input speech signal of 240ms duration, where each small block is of 10ms duration. The X-axis represents 32 convolutional filters. It is to be observed that context-sensitive processors are able to effectively amplify and suppress the transmission of relevant and irrelevant signals, respectively. For example, here low-level layers are restricting the transmission of irrelevant information to higher levels i.e., far fewer filters in Layer 1 and Layer 2 are active compared to point processors-driven deep feature maps. In addition, it is to be noted that context-sensitive processors could construct high-level representation of the output at low-level layers requiring less number of processors to construct a good representation. (B) The data from 32 processors show that context-sensitive processors reduce the cross-correlation as the data passes through different layers compared to the point processors.
  • Figure S1: Context-sensitive neural information processing: detailed information flow.(A) Two-compartment two-unit circuit. The receptive field (R) in blue arrives at the basal. The local context (LC) (distal (D) in orange and proximal (P) in grey) and the universal context (U) in maroon arrive via synapses at the apical. U could explicitly be extended to the sources of inputs to include prior knowledge (K), emotions (E), and semantic knowledge (S). (B) Individual context-sensitive processors cooperate moment-by-moment via local and universal forms of context to separate coherent from conflicting signals via asynchronous modulatory transfer functions with the conditional probability of Y: $Pr(Y=1|R=r, C=c)=p(T(r,c))$, where p is the half-Gaussian filter and T(r,c) is a continuous $\mathbb{R}^2$ function. The extracted coherent signals are recombined to extract synergistic memory signals. (C) Formation of contextual fields in a convolutional neural net. The convolutional block uses conventional point processors to generate R, P, D, and U, and the non-parametric modulation (NPM) block uses context-sensitive processors. Note that R in NMP block is non-parametric.
  • ...and 10 more figures