Meta-Learning Loss Functions for Deep Neural Networks
Christian Raymond
TL;DR
This work investigates meta-learning for loss functions in deep neural networks, proposing EvoMAL to learn symbolic, model-agnostic losses that outperform handcrafted losses while remaining interpretable. It analyzes the learned losses to reveal connections to label smoothing and reveals SparseLSR, a fast, memory-efficient variant. The thesis further introduces AdaLFL for online adaptive loss learning and Neural Procedural Bias Meta-Learning (NPBML) to jointly meta-learn initialization, optimizers, and losses with task-adaptive FiLM conditioning. Together, these contributions show that meta-learned losses and procedural biases can markedly improve convergence, sample efficiency, and generalization across diverse supervised and few-shot tasks, highlighting practical routes to rethink loss design in AI systems.
Abstract
Humans can often quickly and efficiently solve complex new learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully.
