Dataset-learning duality and emergent criticality
Ekaterina Kukleva, Vitaly Vanchurin
TL;DR
This work introduces a dataset-learning duality, a bulk-boundary mapping between non-trainable boundary states and the tangent space of trainable variables in neural networks, developed through activation and learning passes. By analyzing a local learning equilibrium and employing a probabilistic framework, it provides a Jacobian-based description of how boundary data induce fluctuations in trainable parameters, enabling a microscopic view of emergent criticality. The authors show that specific compositions of activation and loss functions can generate power-law fluctuations in the trainable variables, even when the dataset is non-critical, with analytical forms and supporting numerical experiments on a two-neuron toy model and a two-class dataset. This mechanism for scale-invariant learning dynamics suggests tunable routes to control criticality via activation or loss design and offers potential insights into critical phenomena in physical and biological systems.
Abstract
In artificial neural networks, the activation dynamics of non-trainable variables is strongly coupled to the learning dynamics of trainable variables. During the activation pass, the boundary neurons (e.g., input neurons) are mapped to the bulk neurons (e.g., hidden neurons), and during the learning pass, both bulk and boundary neurons are mapped to changes in trainable variables (e.g., weights and biases). For example, in feed-forward neural networks, forward propagation is the activation pass and backward propagation is the learning pass. We show that a composition of the two maps establishes a duality map between a subspace of non-trainable boundary variables (e.g., dataset) and a tangent subspace of trainable variables (i.e., learning). In general, the dataset-learning duality is a complex non-linear map between high-dimensional spaces. We use duality to study the emergence of criticality, or the power-law distribution of fluctuations of the trainable variables, using a toy model at learning equilibrium. In particular, we show that criticality can emerge in the learning system even from the dataset in a non-critical state, and that the power-law distribution can be modified by changing either the activation function or the loss function.
