Table of Contents
Fetching ...

Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron

Christian Schmid, James M. Murray

TL;DR

A stochastic-process approach is used to derive flow equations describing learning, applying this framework to the case of a nonlinear perceptron performing binary classification and finding that the input-data noise differently affects the learning speed under SL vs. RL, as well as how quickly learning of a task is overwritten by subsequent learning.

Abstract

The ability of a brain or a neural network to efficiently learn depends crucially on both the task structure and the learning rule. Previous works have analyzed the dynamical equations describing learning in the relatively simplified context of the perceptron under assumptions of a student-teacher framework or a linearized output. While these assumptions have facilitated theoretical understanding, they have precluded a detailed understanding of the roles of the nonlinearity and input-data distribution in determining the learning dynamics, limiting the applicability of the theories to real biological or artificial neural networks. Here, we use a stochastic-process approach to derive flow equations describing learning, applying this framework to the case of a nonlinear perceptron performing binary classification. We characterize the effects of the learning rule (supervised or reinforcement learning, SL/RL) and input-data distribution on the perceptron's learning curve and the forgetting curve as subsequent tasks are learned. In particular, we find that the input-data noise differently affects the learning speed under SL vs. RL, as well as determines how quickly learning of a task is overwritten by subsequent learning. Additionally, we verify our approach with real data using the MNIST dataset. This approach points a way toward analyzing learning dynamics for more-complex circuit architectures.

Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron

TL;DR

A stochastic-process approach is used to derive flow equations describing learning, applying this framework to the case of a nonlinear perceptron performing binary classification and finding that the input-data noise differently affects the learning speed under SL vs. RL, as well as how quickly learning of a task is overwritten by subsequent learning.

Abstract

The ability of a brain or a neural network to efficiently learn depends crucially on both the task structure and the learning rule. Previous works have analyzed the dynamical equations describing learning in the relatively simplified context of the perceptron under assumptions of a student-teacher framework or a linearized output. While these assumptions have facilitated theoretical understanding, they have precluded a detailed understanding of the roles of the nonlinearity and input-data distribution in determining the learning dynamics, limiting the applicability of the theories to real biological or artificial neural networks. Here, we use a stochastic-process approach to derive flow equations describing learning, applying this framework to the case of a nonlinear perceptron performing binary classification. We characterize the effects of the learning rule (supervised or reinforcement learning, SL/RL) and input-data distribution on the perceptron's learning curve and the forgetting curve as subsequent tasks are learned. In particular, we find that the input-data noise differently affects the learning speed under SL vs. RL, as well as determines how quickly learning of a task is overwritten by subsequent learning. Additionally, we verify our approach with real data using the MNIST dataset. This approach points a way toward analyzing learning dynamics for more-complex circuit architectures.
Paper Structure (16 sections, 41 equations, 7 figures)

This paper contains 16 sections, 41 equations, 7 figures.

Figures (7)

  • Figure 1: Learning dynamics in the nonlinear perceptron. A: The perceptron, parametrized by weights $\mathbf{w}$, maps an input $\mathbf{x}$ to the output $\hat{y}$. B: The inputs are drawn from two multivariate normal distributions with labels $y=\pm1$. The weight vector $\mathbf{w}$ is orthogonal to the classification boundary. C: Due to the stochasticity inherent in the update equations, the weights are described by the flow of a probability distribution in weight space.
  • Figure 2: Learning dynamics in a perceptron classification task. A, B: Flow fields determining the weight dynamics with trajectories for different initial conditions for SL (A) and RL (B). C, D: Learning dynamics from simulations closely follow the analytical results for SL (C) and RL (D). E: Dependence of the asymptotic weight norm on the regularization parameter $\lambda$.
  • Figure 3: Relationship between input noise and time to learn the task. A: The time required for the alignment $\boldsymbol{\mu}\cdot\langle\mathbf{w}\rangle/|\langle\mathbf{w}\rangle|$ to reach 80% depends on the noise $\sigma$ of the isotropic input distributions. B: To characterize anisotropic input noise, the total input variance is split into a noise component $\sigma_\parallel^2$ parallel to and a component $\sigma_\bot^2$ orthogonal to the decoding direction. C: Shifting the input noise into the decoding direction slows down learning.
  • Figure 4: Dynamics of the total variance of $\mathbf{w}$ for isotropic input noise. Higher noise leads to a faster decay in $\mathrm{tr}\left(\mathrm{Cov}(\mathbf{w}) \right)$ for supervised learning (A) and for reinforcement learning (B).
  • Figure 5: Comparison of the theory with training on MNIST. A: A nonlinear perceptron is trained to classify the digits 0 and 1 in the MNIST dataset. B: Comparison of the empirical test classification accuracy with the theoretical prediction. C: Even after the task has been learned, the theory accurately captures non-trivial ongoing learning dynamics.
  • ...and 2 more figures