Table of Contents
Fetching ...

TARDIS: Mitigating Temporal Misalignment via Representation Steering

Changho Shin, Xinya Yan, Suenggwan Jo, Sungjun Cho, Shourjo Aditya Chaudhuri, Frederic Sala

TL;DR

Temporal misalignment—shifts between training and test distributions over time—causes language models to underperform. TARDIS proposes an unsupervised representation-editing approach that computes steering vectors $v^l_{s\rightarrow t}$ from mean representations across time and applies a controlled shift to hidden states via $\tilde{h}^l = h^l + \alpha v^l_{s\rightarrow t}$, targeting key layers. The method can interpolate/extrapolate steering vectors and even dynamically combine multiple time-period steering when the target time is unknown, achieving up to $19.2\%$ accuracy gains without fine-tuning across three datasets and various model sizes. This lightweight, inference-time adaptation offers practical temporal robustness for evolving language tasks and opens avenues for dynamic temporal reasoning with representation-level edits.

Abstract

Language models often struggle with temporal misalignment, performance degradation caused by shifts in the temporal distribution of data. Continuously updating models to avoid degradation is expensive. Can models be adapted without updating model weights? We present TARDIS, an unsupervised representation editing method that addresses this challenge. TARDIS extracts steering vectors from unlabeled data and adjusts the model's representations to better align with the target time period's distribution. Our experiments reveal that TARDIS enhances downstream task performance without the need for fine-tuning, can mitigate temporal misalignment even when exact target time period data is unavailable, and remains efficient even when the temporal information of the target data points is unknown at inference time.

TARDIS: Mitigating Temporal Misalignment via Representation Steering

TL;DR

Temporal misalignment—shifts between training and test distributions over time—causes language models to underperform. TARDIS proposes an unsupervised representation-editing approach that computes steering vectors from mean representations across time and applies a controlled shift to hidden states via , targeting key layers. The method can interpolate/extrapolate steering vectors and even dynamically combine multiple time-period steering when the target time is unknown, achieving up to accuracy gains without fine-tuning across three datasets and various model sizes. This lightweight, inference-time adaptation offers practical temporal robustness for evolving language tasks and opens avenues for dynamic temporal reasoning with representation-level edits.

Abstract

Language models often struggle with temporal misalignment, performance degradation caused by shifts in the temporal distribution of data. Continuously updating models to avoid degradation is expensive. Can models be adapted without updating model weights? We present TARDIS, an unsupervised representation editing method that addresses this challenge. TARDIS extracts steering vectors from unlabeled data and adjusts the model's representations to better align with the target time period's distribution. Our experiments reveal that TARDIS enhances downstream task performance without the need for fine-tuning, can mitigate temporal misalignment even when exact target time period data is unavailable, and remains efficient even when the temporal information of the target data points is unknown at inference time.

Paper Structure

This paper contains 54 sections, 4 equations, 32 figures, 2 tables.

Figures (32)

  • Figure 1: TARDIS steers model representations to mitigate degradation caused by temporal misalignment.
  • Figure 2: Performance gains when using $\textsc{TARDIS}$ . We observe that $\textsc{TARDIS}$ can improve accuracy up to 19.2% without any fine-tuning.
  • Figure 3: Semi-synthetic label shift experiment result with NewsCls task (a) and semi-synthetic vocabulary shift experiment result with PoliAff task (b). $\textsc{TARDIS}$ with $\alpha > 0$ yields more accuracy improvement as the level of the label shift increases, while $\textsc{TARDIS}$ with $\alpha < 0$ is effective to mitigate degradation due to vocabulary shift.
  • Figure 4: Interpolation/extrapolation of steering vectors. $\textsc{TARDIS}$ uses steering vectors taken at the target time period. We see that interpolated/extrapolated steering vectors can mitigate degradation similarly to the exact time steering vectors.
  • Figure A1: Label distribution of datasets
  • ...and 27 more figures