Activation Bottleneck: Sigmoidal Neural Networks Cannot Forecast a Straight Line
Maximilian Toller, Hussain Hussain, Bernhard C Geiger
TL;DR
The paper investigates activation bottlenecks, defined as hidden layers with bounded image, and shows that sigmoidal networks, including typical LSTM/GRU configurations, cannot forecast unbounded sequences such as straight lines or trends. It formalizes the concept with definitions and the maximum approximation error $\varepsilon^\star_{f,g}$, and proves that a bottleneck followed by Lipschitz layers yields a bounded $g$; for surjective targets with unbounded domains this leads to $\varepsilon^\star_{f,g}=\\infty$. Empirically, networks with bottlenecks fail to fit simple unbounded sequences while architectures without bottlenecks succeed. The paper also proposes mitigation strategies (skip-connections, switching to ReLU/linear activations, or adding non-Lipschitz layers) and provides practical guidance on handling unbounded data in real-world tasks.
Abstract
A neural network has an activation bottleneck if one of its hidden layers has a bounded image. We show that networks with an activation bottleneck cannot forecast unbounded sequences such as straight lines, random walks, or any sequence with a trend: The difference between prediction and ground truth becomes arbitrary large, regardless of the training procedure. Widely-used neural network architectures such as LSTM and GRU suffer from this limitation. In our analysis, we characterize activation bottlenecks and explain why they prevent sigmoidal networks from learning unbounded sequences. We experimentally validate our findings and discuss modifications to network architectures which mitigate the effects of activation bottlenecks.
