Time is Encoded in the Weights of Finetuned Language Models
Kai Nylund, Suchin Gururangan, Noah A. Smith
TL;DR
The paper addresses temporal misalignment in language models by introducing time vectors, which are computed as $\ au_t = \\theta_t - \\theta_{pre}$ to capture how finetuning on a single time period shifts weights. These vectors enable weight-space interpolation to handle intervening and future time periods, and they reveal that time is organized as a manifold in weight space, with closer times yielding more similar vectors. The authors demonstrate linear yearly degradation, seasonal monthly patterns, and a strong relationship between time-vector similarity and temporal degradation across tasks and model sizes. They further show that interpolating between time vectors improves performance on unseen times and that task analogies can update models to future times using unlabeled data, though multi-year model soups do not outperform training on all data; the work provides practical, scalable tools for temporally aware language modeling and contributes a new perspective on how time is represented in neural weight spaces.
Abstract
We present time vectors, a simple tool to customize language models to new time periods. Time vectors are created by finetuning a language model on data from a single time (e.g., a year or month), and then subtracting the weights of the original pretrained model. This vector specifies a direction in weight space that, as our experiments show, improves performance on text from that time period. Time vectors specialized to adjacent time periods appear to be positioned closer together in a manifold. Using this structure, we interpolate between time vectors to induce new models that perform better on intervening and future time periods, without any additional training. We demonstrate the consistency of our findings across different tasks, domains, model sizes, and time scales. Our results suggest that time is encoded in the weight space of finetuned models.
