Temporal Alignment of Time Sensitive Facts with Activation Engineering
Sanjay Govindan, Maurice Pagnucco, Yang Song
TL;DR
The paper tackles time-sensitivity in factual recall by temporally aligning LLMs through activation engineering (AE) during inference, avoiding retraining. By injecting alignment vectors $ae$—constructed from activation vectors $h$ across layers $l$ with coefficients $c$—into the residual stream of the prompt $p_u$, the authors steer outputs toward a chosen year without pre-aligned data. Across LLaMA2-7b, 13b, and 70b, AE achieves up to $44\%$ relative and $16\%$ explicit improvements in $F1$ scores, comparable to Zhao et al. (2024) fine-tuning while exploiting far lower computational resources and data requirements; multi-layer injections and varied prompts further boost performance, with early layers being most effective. These results highlight AE as a practical, scalable method for real-time, time-appropriate factual recall in large language models and suggest opportunities to combine AE with knowledge-editing or retrieval-augmented approaches for broader applicability.
Abstract
Large Language Models (LLMs) are trained on diverse and often conflicting knowledge spanning multiple domains and time periods. Some of this knowledge is only valid within specific temporal contexts, such as answering the question, "Who is the President of the United States in 2022?" Ensuring LLMs generate time appropriate responses is crucial for maintaining relevance and accuracy. In this work we explore activation engineering as a method for temporally aligning LLMs to improve factual recall without any training or dataset creation. In this research we explore an activation engineering technique to ground three versions of LLaMA 2 to specific points in time and examine the effects of varying injection layers and prompting strategies. Our experiments demonstrate up to a 44% and 16% improvement in relative and explicit prompting respectively, achieving comparable performance to the fine-tuning method proposed by Zhao et al. (2024) . Notably, our approach achieves similar results to the fine-tuning baseline while being significantly more computationally efficient and requiring no pre-aligned datasets.
