Fine-Tuning Language Models via Epistemic Neural Networks
Ian Osband, Seyed Mohammad Asghari, Benjamin Van Roy, Nat McAleese, John Aslanides, Geoffrey Irving
TL;DR
The paper addresses data inefficiency in fine-tuning language models by introducing epistemic neural networks (ENNs) augmented with an epinet to estimate and leverage model uncertainty. By prioritizing informative labeled examples through epistemic uncertainty, the approach achieves comparable or better performance with substantially fewer labels on synthetic benchmarks and GLUE tasks, notably MNLI, when fine-tuning BERT. The key contributions include a formal ENN framework, novel priority functions for active learning, and comprehensive experiments showing data-efficiency gains and favorable comparison with ensembles and dropout at lower computational cost. This work demonstrates a scalable path to uncertainty-aware data selection in large pretrained models, with open-source code to facilitate replication and extension to larger foundation models.
Abstract
Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data. However, typical fine-tuning schemes do not prioritize the examples that they tune on. We show that, if you can prioritize informative training data, you can achieve better performance while using fewer labels. To do this we augment a language model with an epinet: a small additional network that helps to estimate model uncertainty and forms an \textit{epistemic neural network} (ENN). ENNs are neural networks that can know what they don't know. Using an epinet to prioritize uncertain data, we can fine-tune BERT on GLUE tasks to the same performance while using 2x less data than training without prioritization. We also investigate performance in synthetic neural network generative models designed to build understanding. In each setting, using an epinet outperforms heuristic active learning schemes.
