Dynamical loss functions shape landscape topography and improve learning in artificial neural networks
Eduardo Lavin Pallero, Miguel Ruiz-Garcia
TL;DR
The paper addresses how oscillatory, class-wise weights in loss functions can reshape the loss landscape to improve learning in neural networks. It introduces dynamical cross-entropy $\mathcal{F}_{\mathrm{DCE}}$ with per-class oscillations $\Gamma_i(t)$ and two dynamical MSE variants $\mathcal{F}_{\mathrm{DMSE1}}, \mathcal{F}_{\mathrm{DMSE2}}$, governed by amplitude $A$ and period $T$, preserving the global minima while altering optimization trajectories. Empirically, these dynamical losses yield higher validation accuracy than static losses, especially for small networks, and interact with curvature to produce period-doubling instabilities that enhance exploration in the loss landscape. The work links dynamical loss design to edge-of-stability minimization and suggests avenues for curriculum-free learning strategies and future integration with SGD-based training, with practical implications for reducing overparameterization and computational cost.
Abstract
Dynamical loss functions are derived from standard loss functions used in supervised classification tasks, but are modified so that the contribution from each class periodically increases and decreases. These oscillations globally alter the loss landscape without affecting the global minima. In this paper, we demonstrate how to transform cross-entropy and mean squared error into dynamical loss functions. We begin by discussing the impact of increasing the size of the neural network or the learning rate on the depth and sharpness of the minima that the system explores. Building on this intuition, we propose several versions of dynamical loss functions and use a simple classification problem where we can show how they significantly improve validation accuracy for networks of varying sizes. Finally, we explore how the landscape of these dynamical loss functions evolves during training, highlighting the emergence of instabilities that may be linked to edge-of-instability minimization.
