Table of Contents
Fetching ...

Temporal Alignment of Time Sensitive Facts with Activation Engineering

Sanjay Govindan, Maurice Pagnucco, Yang Song

TL;DR

The paper tackles time-sensitivity in factual recall by temporally aligning LLMs through activation engineering (AE) during inference, avoiding retraining. By injecting alignment vectors $ae$—constructed from activation vectors $h$ across layers $l$ with coefficients $c$—into the residual stream of the prompt $p_u$, the authors steer outputs toward a chosen year without pre-aligned data. Across LLaMA2-7b, 13b, and 70b, AE achieves up to $44\%$ relative and $16\%$ explicit improvements in $F1$ scores, comparable to Zhao et al. (2024) fine-tuning while exploiting far lower computational resources and data requirements; multi-layer injections and varied prompts further boost performance, with early layers being most effective. These results highlight AE as a practical, scalable method for real-time, time-appropriate factual recall in large language models and suggest opportunities to combine AE with knowledge-editing or retrieval-augmented approaches for broader applicability.

Abstract

Large Language Models (LLMs) are trained on diverse and often conflicting knowledge spanning multiple domains and time periods. Some of this knowledge is only valid within specific temporal contexts, such as answering the question, "Who is the President of the United States in 2022?" Ensuring LLMs generate time appropriate responses is crucial for maintaining relevance and accuracy. In this work we explore activation engineering as a method for temporally aligning LLMs to improve factual recall without any training or dataset creation. In this research we explore an activation engineering technique to ground three versions of LLaMA 2 to specific points in time and examine the effects of varying injection layers and prompting strategies. Our experiments demonstrate up to a 44% and 16% improvement in relative and explicit prompting respectively, achieving comparable performance to the fine-tuning method proposed by Zhao et al. (2024) . Notably, our approach achieves similar results to the fine-tuning baseline while being significantly more computationally efficient and requiring no pre-aligned datasets.

Temporal Alignment of Time Sensitive Facts with Activation Engineering

TL;DR

The paper tackles time-sensitivity in factual recall by temporally aligning LLMs through activation engineering (AE) during inference, avoiding retraining. By injecting alignment vectors —constructed from activation vectors across layers with coefficients —into the residual stream of the prompt , the authors steer outputs toward a chosen year without pre-aligned data. Across LLaMA2-7b, 13b, and 70b, AE achieves up to relative and explicit improvements in scores, comparable to Zhao et al. (2024) fine-tuning while exploiting far lower computational resources and data requirements; multi-layer injections and varied prompts further boost performance, with early layers being most effective. These results highlight AE as a practical, scalable method for real-time, time-appropriate factual recall in large language models and suggest opportunities to combine AE with knowledge-editing or retrieval-augmented approaches for broader applicability.

Abstract

Large Language Models (LLMs) are trained on diverse and often conflicting knowledge spanning multiple domains and time periods. Some of this knowledge is only valid within specific temporal contexts, such as answering the question, "Who is the President of the United States in 2022?" Ensuring LLMs generate time appropriate responses is crucial for maintaining relevance and accuracy. In this work we explore activation engineering as a method for temporally aligning LLMs to improve factual recall without any training or dataset creation. In this research we explore an activation engineering technique to ground three versions of LLaMA 2 to specific points in time and examine the effects of varying injection layers and prompting strategies. Our experiments demonstrate up to a 44% and 16% improvement in relative and explicit prompting respectively, achieving comparable performance to the fine-tuning method proposed by Zhao et al. (2024) . Notably, our approach achieves similar results to the fine-tuning baseline while being significantly more computationally efficient and requiring no pre-aligned datasets.

Paper Structure

This paper contains 21 sections, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: When asked "Who is the current Prime Minister of Japan" LLaMA2-7b outputs Yoshihide Suga. Applying activation engineering as temporal alignment assistance for the year 2022 produces the correct set of facts for LLaMA-7b.
  • Figure 2: Activation Engineering in LLMs. A set of vectors is extracted from layer $l$, multiplied by a coefficient and added together. Finally, this vector is added into a temporal question to temporally align the model.
  • Figure 3: Left (HOG Dataset), right (Taqa-9000) benchmarking F1 scores for LLaMA2-7b for both relative (checked line) and explicit prompts.
  • Figure 4: Left (LLaMA2-7b), right (LLaMA2-70b) single layer AE effect on the HOG dataset, using "year only" prompting aligning to the year 2015. Layers 4-29 are present. Lighter colours denote lower layers (4-11), and darker colours denote higher layers (12-29). The labels denote the best result and layer.
  • Figure 5: Left (Single layer), right (Multi layer) alignment to 2022 with AE applied to LLaMA2-70b. AE is applied to different layers. The Y-axis is the difference in F1 score between our AE method and explicit prompting. Darker colours denote higher layers. For multi-layer approach, layer 4 is the first layer, and the colour denotes the last layer included. The maximum layer count in both graphs is 26.
  • ...and 1 more figures