Table of Contents
Fetching ...

Including local feature interactions in deep non-negative matrix factorization networks improves performance

Mahbod Nouri, David Rotermund, Alberto Garcia-Ortiz, Klaus R. Pawelzik

TL;DR

This paper argues that incorporating biologically inspired, non-negative factorization with local feature mixing enhances deep vision models. By embedding CNMF modules and 1×1 convolutions, the approach captures positive long-range interactions while modeling local inhibition, and it employs an approximate back-propagation scheme to manage the iterative NMF computations. Empirical results on CIFAR-10 show that CNMF combined with 1×1 mixing can outperform similarly sized CNNs, with peak performance achieved when CNN and NMF parameter counts are balanced. The work bridges biological and artificial neural computation, offering a plausible, more interpretable alternative that preserves performance while aligning with cortical processing principles.

Abstract

The brain uses positive signals as a means of signaling. Forward interactions in the early visual cortex are also positive, realized by excitatory synapses. Only local interactions also include inhibition. Non-negative matrix factorization (NMF) captures the biological constraint of positive long-range interactions and can be implemented with stochastic spikes. While NMF can serve as an abstract formalization of early neural processing in the visual system, the performance of deep convolutional networks with NMF modules does not match that of CNNs of similar size. However, when the local NMF modules are each followed by a module that mixes the NMF's positive activities, the performances on the benchmark data exceed that of vanilla deep convolutional networks of similar size. This setting can be considered a biologically more plausible emulation of the processing in cortical (hyper-)columns with the potential to improve the performance of deep networks.

Including local feature interactions in deep non-negative matrix factorization networks improves performance

TL;DR

This paper argues that incorporating biologically inspired, non-negative factorization with local feature mixing enhances deep vision models. By embedding CNMF modules and 1×1 convolutions, the approach captures positive long-range interactions while modeling local inhibition, and it employs an approximate back-propagation scheme to manage the iterative NMF computations. Empirical results on CIFAR-10 show that CNMF combined with 1×1 mixing can outperform similarly sized CNNs, with peak performance achieved when CNN and NMF parameter counts are balanced. The work bridges biological and artificial neural computation, offering a plausible, more interpretable alternative that preserves performance while aligning with cortical processing principles.

Abstract

The brain uses positive signals as a means of signaling. Forward interactions in the early visual cortex are also positive, realized by excitatory synapses. Only local interactions also include inhibition. Non-negative matrix factorization (NMF) captures the biological constraint of positive long-range interactions and can be implemented with stochastic spikes. While NMF can serve as an abstract formalization of early neural processing in the visual system, the performance of deep convolutional networks with NMF modules does not match that of CNNs of similar size. However, when the local NMF modules are each followed by a module that mixes the NMF's positive activities, the performances on the benchmark data exceed that of vanilla deep convolutional networks of similar size. This setting can be considered a biologically more plausible emulation of the processing in cortical (hyper-)columns with the potential to improve the performance of deep networks.

Paper Structure

This paper contains 17 sections, 16 equations, 6 figures.

Figures (6)

  • Figure 1: Performance and computational cost comparison during the back-propagation between CNN, NMF, and NMF with approximate back-propagation (ours). The comparison spans three metrics: back-propagation memory consumption (left), back-propagation computation time (middle), and classification error (right). Memory and time values are shown relative to the CNN baseline.
  • Figure 2: The difference between our approximative approach and the naive back-propagation. Since NMF is an iterative algorithm, the output of each layer is computed after several iterations of the update rule. To apply the vanilla back-propagation, all these intermediate steps are required to be saved to the memory during the forward pass, which is time- and memory-inefficient. Instead, our proposed approximated back-propagation can compute the corresponding error of a lower layer in one step, only utilizing the output of the layer.
  • Figure 3: Network architecture of the proposed method for the CNMF + $1\times 1$ Convolution. The network consists of four sequential blocks, each containing a CNMF module followed by a 1$\times$1 convolutional layer. The architecture progressively reduces spatial dimensions from 28×28 in the input to 1×1 while transforming feature channels ($32\rightarrow64\rightarrow96\rightarrow10\rightarrow$ output). The output of the last 1$\times$1 convolutional layer is used for the classification. For simplicity, activations and batch normalization layers are omitted from the figure.
  • Figure 4: Model architecture of all investigated networks. a) Overall model architecture. All three convolutional layers consist of one of the modules listed on the right. b) Module used in the baseline CNN model. c) Module used in the CNMF model. d) Module used in the CNN + 1$\times$1 Conv model. e) Module used in the CNMF + 1$\times$1 Conv model.
  • Figure 5: Classification performance of models on the CIFAR-10 dataset. Error bars represent variability across five models trained with different random initializations.
  • ...and 1 more figures