Table of Contents
Fetching ...

Self-Organization Towards $1/f$ Noise in Deep Neural Networks

Nicholas Chong Jia Le, Ling Feng

TL;DR

The study investigates whether deep neural networks exhibit the $1/f$ pink-noise signature observed in biological brains. Using LSTM networks trained on IMDb sentiment data, the authors analyze activation time series with unbiased DFA and show robust $1/f$-like spectra in activations (with $\beta \approx 0.83$ on test data) but not in inputs (white noise). They demonstrate that increasing network capacity shifts the spectrum toward white noise due to many underutilized neurons, and they reveal a consistent internal/external activation distinction analogous to brain measurements. These findings suggest $1/f$ noise may signal efficient learning in artificial systems and provide a controllable platform to explore the origins of such scale-invariant fluctuations across neural architectures.

Abstract

The presence of $1/f$ noise, also known as pink noise, is a well-established phenomenon in biological neural networks, and is thought to play an important role in information processing in the brain. In this study, we find that such $1/f$ noise is also found in deep neural networks trained on natural language, resembling that of their biological counterparts. Specifically, we trained Long Short-Term Memory (LSTM) networks on the `IMDb' AI benchmark dataset, then measured the neuron activations. The detrended fluctuation analysis (DFA) on the time series of the different neurons demonstrate clear $1/f$ patterns, which is absent in the time series of the inputs to the LSTM. Interestingly, when the neural network is at overcapacity, having more than enough neurons to achieve the learning task, the activation patterns deviate from $1/f$ noise and shifts towards white noise. This is because many of the neurons are not effectively used, showing little fluctuations when fed with input data. We further examine the exponent values in the $1/f$ noise in ``internal" and ``external" activations in the LSTM cell, finding some resemblance in the variations of the exponents in fMRI signals of the human brain. Our findings further supports the hypothesis that $1/f$ noise is a signature of optimal learning. With deep learning models approaching or surpassing humans in certain tasks, and being more ``experimentable'' than their biological counterparts, our study suggests that they are good candidates to understand the fundamental origins of $1/f$ noise.

Self-Organization Towards $1/f$ Noise in Deep Neural Networks

TL;DR

The study investigates whether deep neural networks exhibit the pink-noise signature observed in biological brains. Using LSTM networks trained on IMDb sentiment data, the authors analyze activation time series with unbiased DFA and show robust -like spectra in activations (with on test data) but not in inputs (white noise). They demonstrate that increasing network capacity shifts the spectrum toward white noise due to many underutilized neurons, and they reveal a consistent internal/external activation distinction analogous to brain measurements. These findings suggest noise may signal efficient learning in artificial systems and provide a controllable platform to explore the origins of such scale-invariant fluctuations across neural architectures.

Abstract

The presence of noise, also known as pink noise, is a well-established phenomenon in biological neural networks, and is thought to play an important role in information processing in the brain. In this study, we find that such noise is also found in deep neural networks trained on natural language, resembling that of their biological counterparts. Specifically, we trained Long Short-Term Memory (LSTM) networks on the `IMDb' AI benchmark dataset, then measured the neuron activations. The detrended fluctuation analysis (DFA) on the time series of the different neurons demonstrate clear patterns, which is absent in the time series of the inputs to the LSTM. Interestingly, when the neural network is at overcapacity, having more than enough neurons to achieve the learning task, the activation patterns deviate from noise and shifts towards white noise. This is because many of the neurons are not effectively used, showing little fluctuations when fed with input data. We further examine the exponent values in the noise in ``internal" and ``external" activations in the LSTM cell, finding some resemblance in the variations of the exponents in fMRI signals of the human brain. Our findings further supports the hypothesis that noise is a signature of optimal learning. With deep learning models approaching or surpassing humans in certain tasks, and being more ``experimentable'' than their biological counterparts, our study suggests that they are good candidates to understand the fundamental origins of noise.
Paper Structure (14 sections, 1 equation, 19 figures, 2 tables)

This paper contains 14 sections, 1 equation, 19 figures, 2 tables.

Figures (19)

  • Figure 1: A (many-to-one) recurrent neural network visualised in its temporally unrolled representation. A time series (in this case a movie review with $n$ words) is input into the network sequentially. For each time step $t$, the $t$-th word passes into the embedding layer, which converts the word into a vector using a learned representation of a continuous vector space. The vector $\boldsymbol{x}_t$ then passes into the recurrent layer, which accepts both $\boldsymbol{x}_t$ and the output of itself from the previous time step, $\boldsymbol{h}_{t-1}$. The recurrent layer then passes its output, $\boldsymbol{h}_t$, into itself for the next time step. At the final time step $n$, the recurrent layer passes its output $\boldsymbol{h}_n$ to the output layer which converts it to the output $\boldsymbol{y}$.
  • Figure 2: The LSTM cell (dotted circle) with its internal structure shown. The red lines represent the "internal" activations while the green lines represent the "external" activations. $\sigma_t$ and $\sigma_s$ represent the tanh activation and sigmoid activation respectively.
  • Figure 3: (a) Value of the activation $\boldsymbol{h}_t$ for the entire LSTM layer for a single 901 word review (truncated to 500 words). (b) Log-log plot of the fluctuation $F(t)$ (solid, red) of the activation shown in (a) obtained with DFA with a reference line for $1/f$ noise (dashed, black) plotted. (c) Histogram of the exponents $\beta$ obtained from the DFA exponents $\alpha$ across all the test reviews with length $\ge$ 500. Exponents for the input embedding activations (grey) and $\boldsymbol{h}_t$ (blue) are both plotted. Dotted lines corresponding to $\beta=0$ and $\beta=1$ plotted for reference.
  • Figure 4: Scatter plot of the exponents of $\boldsymbol{h}_t$ vs the exponents of the input $\boldsymbol{x}_t$ for test reviews with length $\ge 500$. Model used is the same as \ref{['fig:resultsbig-a']}. The input exponents here are not correlated to the activation exponents, showing that the $1/f$ phenomenon in the activation values is not from a similar pattern in the inputs.
  • Figure 5: Aggregate values of $\beta$ for the activation $\boldsymbol{h}_t$ vs the number of cells in the LSTM layer. $1/f$ noise (dashed, pink) and white noise (dashed, black) are shown for reference.
  • ...and 14 more figures