Leveraging chaotic transients in the training of artificial neural networks

Pedro Jiménez-González; Miguel C. Soriano; Lucas Lacasa

Leveraging chaotic transients in the training of artificial neural networks

Pedro Jiménez-González, Miguel C. Soriano, Lucas Lacasa

TL;DR

The dynamics of the neural network trajectory along training for unconventionally large learning rates is explored, showing that for a region of values of the learning rate, the GD optimization shifts away from purely exploitation-like algorithm into a regime of exploration-exploitation balance, as the neural network is still capable of learning but the trajectory shows sensitive dependence on initial conditions.

Abstract

Traditional algorithms to optimize artificial neural networks when confronted with a supervised learning task are usually exploitation-type relaxational dynamics such as gradient descent (GD). Here, we explore the dynamics of the neural network trajectory along training for unconventionally large learning rates. We show that for a region of values of the learning rate, the GD optimization shifts away from purely exploitation-like algorithm into a regime of exploration-exploitation balance, as the neural network is still capable of learning but the trajectory shows sensitive dependence on initial conditions --as characterized by positive network maximum Lyapunov exponent--. Interestingly, the characteristic training time required to reach an acceptable accuracy in the test set reaches a minimum precisely in such learning rate region, further suggesting that one can accelerate the training of artificial neural networks by locating at the onset of chaos. Our results --initially illustrated for the MNIST classification task-- qualitatively hold for a range of supervised learning tasks, {learning architectures (including both shallow and deep multilayer perceptrons and convolutional neural networks) and other hyperparameters (different activation functions and weight regularisation),} and showcase the emergent, constructive role of transient chaotic dynamics in the training of artificial neural networks.

Leveraging chaotic transients in the training of artificial neural networks

TL;DR

Abstract

Leveraging chaotic transients in the training of artificial neural networks

TL;DR

Abstract

Paper Structure

Figures (14)