Table of Contents
Fetching ...

Meta-Learning Loss Functions for Deep Neural Networks

Christian Raymond

TL;DR

This work investigates meta-learning for loss functions in deep neural networks, proposing EvoMAL to learn symbolic, model-agnostic losses that outperform handcrafted losses while remaining interpretable. It analyzes the learned losses to reveal connections to label smoothing and reveals SparseLSR, a fast, memory-efficient variant. The thesis further introduces AdaLFL for online adaptive loss learning and Neural Procedural Bias Meta-Learning (NPBML) to jointly meta-learn initialization, optimizers, and losses with task-adaptive FiLM conditioning. Together, these contributions show that meta-learned losses and procedural biases can markedly improve convergence, sample efficiency, and generalization across diverse supervised and few-shot tasks, highlighting practical routes to rethink loss design in AI systems.

Abstract

Humans can often quickly and efficiently solve complex new learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully.

Meta-Learning Loss Functions for Deep Neural Networks

TL;DR

This work investigates meta-learning for loss functions in deep neural networks, proposing EvoMAL to learn symbolic, model-agnostic losses that outperform handcrafted losses while remaining interpretable. It analyzes the learned losses to reveal connections to label smoothing and reveals SparseLSR, a fast, memory-efficient variant. The thesis further introduces AdaLFL for online adaptive loss learning and Neural Procedural Bias Meta-Learning (NPBML) to jointly meta-learn initialization, optimizers, and losses with task-adaptive FiLM conditioning. Together, these contributions show that meta-learned losses and procedural biases can markedly improve convergence, sample efficiency, and generalization across diverse supervised and few-shot tasks, highlighting practical routes to rethink loss design in AI systems.

Abstract

Humans can often quickly and efficiently solve complex new learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully.
Paper Structure (190 sections, 102 equations, 49 figures, 16 tables, 10 algorithms)

This paper contains 190 sections, 102 equations, 49 figures, 16 tables, 10 algorithms.

Figures (49)

  • Figure 1: Conventional supervised learning problem setup.
  • Figure 2: Each learning algorithm $\mathcal{A}$ covers a region of efficiency $\mathcal{R}_{\mathcal{A}}$ based on its inductive biases $\omega$. In this example, $\mathcal{A}_{1}$ can efficiently learn task $\mathcal{T}_1$, while $\mathcal{A}_{2}$ can efficiently learn tasks $\mathcal{T}_2$, and $\mathcal{T}_6$, and finally $\mathcal{A}_3$ efficiently learn task $\mathcal{T}_3$.
  • Figure 3: Visualizing the error rate and cross-entropy loss function (left), and comparing their loss landscapes right when using a simple logistic regression model trained on a synthetic classification task (right).
  • Figure 4: The conventional approach to designing and selecting a learning algorithm for a set of learning tasks in machine learning. Researchers manually design the learning algorithm, which is then deployed to solve new learning tasks.
  • Figure 5: In meta-learning, the learning algorithm is automatically designed (the meta-training phase) by learning to learn over a task distribution $p(\mathcal{T})$, i.e., a set of related learning tasks, before being deployed (the meta-testing phase) to solve unseen tasks sampled from the same task distribution.
  • ...and 44 more figures