Conditional Language Learning with Context
Xiao Zhang, Miao Li, Ji Wu
TL;DR
Domain finetuning often leads to over-adaptation and forgetting due to learning corpus statistics. The authors introduce conditional finetuning, which prepends a context and optimizes $p(x|c)$ while learning a conditional prior $p(\text{topic}|a)$, to achieve selective learning. They show that this approach preserves knowledge learning and reduces forgetting in transfer and continual learning, by modifying the model less and avoiding excessive adaptation to topic priors. The method offers a path toward more robust, lifelong language models and potential bias mitigation, with code and data released for reproducibility.
Abstract
Language models can learn sophisticated language understanding skills from fitting raw text. They also unselectively learn useless corpus statistics and biases, especially during finetuning on domain-specific corpora. In this paper, we propose a simple modification to causal language modeling called conditional finetuning, which performs language modeling conditioned on a context. We show that a context can "explain away" certain corpus statistics and make the model avoid learning them. In this fashion, conditional finetuning achieves selective learning from a corpus, learning knowledge useful for downstream tasks while avoiding learning useless corpus statistics like topic biases. This selective learning effect leads to less forgetting and better stability-plasticity tradeoff in domain finetuning, potentially benefitting lifelong learning with language models.
