Table of Contents
Fetching ...

A developmental approach for training deep belief networks

Matteo Zambra, Alberto Testolin, Marco Zorzi

TL;DR

The paper tackles the challenge of biologically plausible unsupervised training for deep networks by replacing greedy layer-wise DBN learning with an iterative joint approach (iDBN) that updates all layer weights after each sensory input. It employs a CD-based learning variant and online feed-forward propagation on a fixed architecture with a visible layer of $784$ neurons, two hidden layers of $500$ and a final hidden layer of $2000$ neurons. Across MNIST and numerosity tasks, iDBN achieves final performance comparable to traditional greedy training, while providing rich trajectories of internal representations and network topology, and it extends to continual learning via interleaved training. The work also demonstrates the utility of graph-theoretic analyses to study developmental dynamics in hierarchical self-organizing networks, offering a framework for modeling neurocognitive development and its deviations.

Abstract

Deep belief networks (DBNs) are stochastic neural networks that can extract rich internal representations of the environment from the sensory data. DBNs had a catalytic effect in triggering the deep learning revolution, demonstrating for the very first time the feasibility of unsupervised learning in networks with many layers of hidden neurons. These hierarchical architectures incorporate plausible biological and cognitive properties, making them particularly appealing as computational models of human perception and cognition. However, learning in DBNs is usually carried out in a greedy, layer-wise fashion, which does not allow to simulate the holistic maturation of cortical circuits and prevents from modeling cognitive development. Here we present iDBN, an iterative learning algorithm for DBNs that allows to jointly update the connection weights across all layers of the model. We evaluate the proposed iterative algorithm on two different sets of visual stimuli, measuring the generative capabilities of the learned model and its potential to support supervised downstream tasks. We also track network development in terms of graph theoretical properties and investigate the potential extension of iDBN to continual learning scenarios. DBNs trained using our iterative approach achieve a final performance comparable to that of the greedy counterparts, at the same time allowing to accurately analyze the gradual development of internal representations in the deep network and the progressive improvement in task performance. Our work paves the way to the use of iDBN for modeling neurocognitive development.

A developmental approach for training deep belief networks

TL;DR

The paper tackles the challenge of biologically plausible unsupervised training for deep networks by replacing greedy layer-wise DBN learning with an iterative joint approach (iDBN) that updates all layer weights after each sensory input. It employs a CD-based learning variant and online feed-forward propagation on a fixed architecture with a visible layer of neurons, two hidden layers of and a final hidden layer of neurons. Across MNIST and numerosity tasks, iDBN achieves final performance comparable to traditional greedy training, while providing rich trajectories of internal representations and network topology, and it extends to continual learning via interleaved training. The work also demonstrates the utility of graph-theoretic analyses to study developmental dynamics in hierarchical self-organizing networks, offering a framework for modeling neurocognitive development and its deviations.

Abstract

Deep belief networks (DBNs) are stochastic neural networks that can extract rich internal representations of the environment from the sensory data. DBNs had a catalytic effect in triggering the deep learning revolution, demonstrating for the very first time the feasibility of unsupervised learning in networks with many layers of hidden neurons. These hierarchical architectures incorporate plausible biological and cognitive properties, making them particularly appealing as computational models of human perception and cognition. However, learning in DBNs is usually carried out in a greedy, layer-wise fashion, which does not allow to simulate the holistic maturation of cortical circuits and prevents from modeling cognitive development. Here we present iDBN, an iterative learning algorithm for DBNs that allows to jointly update the connection weights across all layers of the model. We evaluate the proposed iterative algorithm on two different sets of visual stimuli, measuring the generative capabilities of the learned model and its potential to support supervised downstream tasks. We also track network development in terms of graph theoretical properties and investigate the potential extension of iDBN to continual learning scenarios. DBNs trained using our iterative approach achieve a final performance comparable to that of the greedy counterparts, at the same time allowing to accurately analyze the gradual development of internal representations in the deep network and the progressive improvement in task performance. Our work paves the way to the use of iDBN for modeling neurocognitive development.
Paper Structure (28 sections, 8 equations, 12 figures, 1 table, 1 algorithm)

This paper contains 28 sections, 8 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Graphical representation of the architecture of a 3-layer Deep Belief Network and the learning schemes implemented in the present work. Green arrows represent bottom-up recognition connections, while red arrows represent top-down generative processing. Yellow boxes enclose local computations. We consider the case of CD1, CD-$k$ can be recovered by repeating the sampling steps $k$ times. $v \sim D$ identifies a data instance sampled from the training set and $\bm{h}_i$ represents the hidden activities of layer $i$. In the greedy scheme (a) hidden layers are trained sequentially, from bottom to top, and input signals are never projected into layer l unless learning at layer l - 1 is completed. In the iterative scheme (b) input signals are immediately propagated through the entire deep network, and top-down processing is performed locally at each layer to jointly learn all connection weights. In the full-stack scheme (c) both feed-forward propagation and top-down processing occur over the entire deep network.
  • Figure 1: Samples from the Numerosity data set, which contains 51200 images featuring a variable number of white rectangles drawn on a black background. Numerosity ranges from $1$ to $32$ and objects have variable position and dimension (see stoianov2012 for further details). The corresponding numerosity is reported on top of each image.
  • Figure 1: (a-b) Performance of the greedy vs. iterative schemes during learning, for the Glorot weights initialization. (c-e) Generative accuracy of the greedy vs. iterative schemes at the end of learning, for combinations of Glorot initialization and dropout.
  • Figure 1: Experiments on the null models $G_{NN}(p)$, $G_{NM}(p)$ and $G_N(p)$. The value of the probability $p$ is kept to $0.01$, while $N = 1000$ for $G_{NN}(p)$, $M = 2000$ for $G_{NM}(p)$ and $G_N(p)$ retains the structure of the DBN.
  • Figure 2: Performance of the greedy vs. iterative learning schemes during learning (top and middle panels) and at the end of the unsupervised learning phase (bottom panel). For the latter case we also report the generation capabilities of the alternative developmental scheme based on full-stack propagation.
  • ...and 7 more figures