Conditional Language Learning with Context

Xiao Zhang; Miao Li; Ji Wu

Conditional Language Learning with Context

Xiao Zhang, Miao Li, Ji Wu

TL;DR

Domain finetuning often leads to over-adaptation and forgetting due to learning corpus statistics. The authors introduce conditional finetuning, which prepends a context and optimizes $p(x|c)$ while learning a conditional prior $p(\text{topic}|a)$, to achieve selective learning. They show that this approach preserves knowledge learning and reduces forgetting in transfer and continual learning, by modifying the model less and avoiding excessive adaptation to topic priors. The method offers a path toward more robust, lifelong language models and potential bias mitigation, with code and data released for reproducibility.

Abstract

Language models can learn sophisticated language understanding skills from fitting raw text. They also unselectively learn useless corpus statistics and biases, especially during finetuning on domain-specific corpora. In this paper, we propose a simple modification to causal language modeling called conditional finetuning, which performs language modeling conditioned on a context. We show that a context can "explain away" certain corpus statistics and make the model avoid learning them. In this fashion, conditional finetuning achieves selective learning from a corpus, learning knowledge useful for downstream tasks while avoiding learning useless corpus statistics like topic biases. This selective learning effect leads to less forgetting and better stability-plasticity tradeoff in domain finetuning, potentially benefitting lifelong learning with language models.

Conditional Language Learning with Context

TL;DR

Domain finetuning often leads to over-adaptation and forgetting due to learning corpus statistics. The authors introduce conditional finetuning, which prepends a context and optimizes

while learning a conditional prior

, to achieve selective learning. They show that this approach preserves knowledge learning and reduces forgetting in transfer and continual learning, by modifying the model less and avoiding excessive adaptation to topic priors. The method offers a path toward more robust, lifelong language models and potential bias mitigation, with code and data released for reproducibility.

Abstract

Paper Structure (19 sections, 8 equations, 7 figures, 5 tables)

This paper contains 19 sections, 8 equations, 7 figures, 5 tables.

Introduction
Related Work
Conditional Learning
Conditional Language Modeling with Context
Conditional Finetuning Reduces Learning of the Topic Prior $p(\text{topic})$
Conditional Finetuning Learns a Conditional Topic Prior $p(\text{topic}|a)$ regardless of Context $a$
Conditional Finetuning does not Affect Knowledge Learning
Less Forgetting through Selective Learning
Conditional Finetuning Modifies Model Less
Conditional Finetuning Reduces Forgetting and Maintains Knowledge Learning in Transfer Learning
Conditional Finetuning Reduces Forgetting and Improves Knowledge Learning in Continual Learning
Discussion
Data
Medical domain
Corpus
...and 4 more sections

Figures (7)

Figure 1: Illustration of conditional finetuning a language model on domain corpus. Compared to standard finetuning, conditional finetuning prepends a context to each document and only learns information conditioned on the context.
Figure 2: Topic likelihood changes during finetuning. Unlike standard finetuning, conditional finetuning does not significantly change topic likelihoods.
Figure 3: Comparing language modeling loss at different token positions, for models finetuned with standard finetuning and conditional finetuning with three types of context. Unlike standard finetuning, conditional finetuning barely increases loss on general text (C4), regardless of the type of context used in training. (model: LLaMA-2 7B)
Figure 4: Examples showing the loss change $\log p(x) - \log p(x|a)$ caused by the context. Before finetuning, domain hint makes the model favor the medical term "bone" while the random UUID string favors technological term "CPU". After finetuning, both contexts have similar effect of favoring medical terms.
Figure 5: Performance-forgetting tradeoff curve of standard finetuning and conditional finetuning on Anatomy and SQuAD (closed-book). Conditional finetuning has less forgetting at similar levels of performance on downstream tasks, achieving significantly better tradeoff than standard finetuning.
...and 2 more figures

Conditional Language Learning with Context

TL;DR

Abstract

Conditional Language Learning with Context

Authors

TL;DR

Abstract

Table of Contents

Figures (7)