TARDIS: Mitigating Temporal Misalignment via Representation Steering
Changho Shin, Xinya Yan, Suenggwan Jo, Sungjun Cho, Shourjo Aditya Chaudhuri, Frederic Sala
TL;DR
Temporal misalignment—shifts between training and test distributions over time—causes language models to underperform. TARDIS proposes an unsupervised representation-editing approach that computes steering vectors $v^l_{s\rightarrow t}$ from mean representations across time and applies a controlled shift to hidden states via $\tilde{h}^l = h^l + \alpha v^l_{s\rightarrow t}$, targeting key layers. The method can interpolate/extrapolate steering vectors and even dynamically combine multiple time-period steering when the target time is unknown, achieving up to $19.2\%$ accuracy gains without fine-tuning across three datasets and various model sizes. This lightweight, inference-time adaptation offers practical temporal robustness for evolving language tasks and opens avenues for dynamic temporal reasoning with representation-level edits.
Abstract
Language models often struggle with temporal misalignment, performance degradation caused by shifts in the temporal distribution of data. Continuously updating models to avoid degradation is expensive. Can models be adapted without updating model weights? We present TARDIS, an unsupervised representation editing method that addresses this challenge. TARDIS extracts steering vectors from unlabeled data and adjusts the model's representations to better align with the target time period's distribution. Our experiments reveal that TARDIS enhances downstream task performance without the need for fine-tuning, can mitigate temporal misalignment even when exact target time period data is unavailable, and remains efficient even when the temporal information of the target data points is unknown at inference time.
