Table of Contents
Fetching ...

Fine-Tuning Language Models via Epistemic Neural Networks

Ian Osband, Seyed Mohammad Asghari, Benjamin Van Roy, Nat McAleese, John Aslanides, Geoffrey Irving

TL;DR

The paper addresses data inefficiency in fine-tuning language models by introducing epistemic neural networks (ENNs) augmented with an epinet to estimate and leverage model uncertainty. By prioritizing informative labeled examples through epistemic uncertainty, the approach achieves comparable or better performance with substantially fewer labels on synthetic benchmarks and GLUE tasks, notably MNLI, when fine-tuning BERT. The key contributions include a formal ENN framework, novel priority functions for active learning, and comprehensive experiments showing data-efficiency gains and favorable comparison with ensembles and dropout at lower computational cost. This work demonstrates a scalable path to uncertainty-aware data selection in large pretrained models, with open-source code to facilitate replication and extension to larger foundation models.

Abstract

Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data. However, typical fine-tuning schemes do not prioritize the examples that they tune on. We show that, if you can prioritize informative training data, you can achieve better performance while using fewer labels. To do this we augment a language model with an epinet: a small additional network that helps to estimate model uncertainty and forms an \textit{epistemic neural network} (ENN). ENNs are neural networks that can know what they don't know. Using an epinet to prioritize uncertain data, we can fine-tune BERT on GLUE tasks to the same performance while using 2x less data than training without prioritization. We also investigate performance in synthetic neural network generative models designed to build understanding. In each setting, using an epinet outperforms heuristic active learning schemes.

Fine-Tuning Language Models via Epistemic Neural Networks

TL;DR

The paper addresses data inefficiency in fine-tuning language models by introducing epistemic neural networks (ENNs) augmented with an epinet to estimate and leverage model uncertainty. By prioritizing informative labeled examples through epistemic uncertainty, the approach achieves comparable or better performance with substantially fewer labels on synthetic benchmarks and GLUE tasks, notably MNLI, when fine-tuning BERT. The key contributions include a formal ENN framework, novel priority functions for active learning, and comprehensive experiments showing data-efficiency gains and favorable comparison with ensembles and dropout at lower computational cost. This work demonstrates a scalable path to uncertainty-aware data selection in large pretrained models, with open-source code to facilitate replication and extension to larger foundation models.

Abstract

Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data. However, typical fine-tuning schemes do not prioritize the examples that they tune on. We show that, if you can prioritize informative training data, you can achieve better performance while using fewer labels. To do this we augment a language model with an epinet: a small additional network that helps to estimate model uncertainty and forms an \textit{epistemic neural network} (ENN). ENNs are neural networks that can know what they don't know. Using an epinet to prioritize uncertain data, we can fine-tune BERT on GLUE tasks to the same performance while using 2x less data than training without prioritization. We also investigate performance in synthetic neural network generative models designed to build understanding. In each setting, using an epinet outperforms heuristic active learning schemes.
Paper Structure (29 sections, 6 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 6 equations, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: Fine-tuning a BERT model on the MNLI dataset williams2017broad. The baseline BERT is tuned over learning rate, batch size and number of SGD steps for each random subset of training labels. The baseline does not prioritize training data. Using an epinet to prioritze, we are able to match baseline performance on the entire training set (the dashed line) with 2x less data (Section \ref{['sec:language']}).
  • Figure 2: Learning with an epinet is able to match the baseline with fewer training labels.
  • Figure 3: Active learning with an epinet is able to provide larger improvements in learning speed than competing approaches on the Neural Testbed osband2022neural.
  • Figure 4: Active learning with an epinet is able to provide larger improvements with fewer labels than competing approaches for BERT on MNLI williams2017broad.
  • Figure 5: Fine-tuning BERT models across GLUE tasks. The baseline BERT is tuned over learning rate, batch size and SGD steps for each random subset of training data. Using an epinet with epistemic prioritization consistently learns with fewer labels and better final performance. Using an epinet with uniform data selection does not produce the same benefit. The choice of epistemic priority (bald vs variance) is relatively unimportant.
  • ...and 8 more figures