Table of Contents
Fetching ...

STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models

Geunyeong Jeong, Juoh Sun, Seonghee Lee, Harksoo Kim

TL;DR

STEAM tackles the problem of integrating updated facts into a language model's internal knowledge rather than merely changing output likelihood. It introduces Latent Positioning and Latent-Level Alignment to create semantic anchors from reference knowledge and steer edited representations toward these anchors via a latent alignment loss, formalized as $\\mathcal{L}(\\delta)=\\mathcal{L}_{NLL}(\\delta)+\\mathcal{L}_{KL}(\\delta)+\\lambda\\L_{LA}(\\delta)$ with $\\L_{LA}$ based on cosine distance to $\\varphi^\ell$ across mid-layers. Empirically, STEAM reduces the semantic isolation of edits, yielding improved Portability and consistent reasoning across GPT-J, Qwen2, and Llama3, both in single edits and batch editing scenarios (e.g., batch gains up to +2.2 in Portability). The approach is validated by latent-space visualizations showing edited residual streams more aligned with reference knowledge and by layer-wise cosine analyses that corroborate semantic integration. This semantic-level editing framework improves reliability and coherence of updated knowledge, enabling more robust long-term knowledge management in LLMs.

Abstract

Large Language Models store extensive factual knowledge acquired during large-scale pre-training. However, this knowledge is inherently static, reflecting only the state of the world at the time of training. Knowledge editing has emerged as a promising solution for updating outdated or incorrect facts without full retraining. However, most existing locate-and-edit methods primarily focus on token-level likelihood optimization without addressing semantic coherence. Our analysis reveals that such edited knowledge is often encoded as isolated residual streams in the model's latent space, distinct from pre-existing knowledge and bypassing natural reasoning process. To address this, we propose \textsc{Steam}, a semantic-level knowledge editing framework that enhances integration of updated knowledge into the model's knowledge structure. \textsc{Steam} first identifies target representations as semantic anchors for the updated factual association, then guides the internal representation of the edited fact towards these anchors through an alignment loss during optimization. Experimental results demonstrate that \textsc{Steam} improves model's ability to reason with edited knowledge and enhances semantic coherence, underscoring the importance of latent-space alignment for reliable and coherent knowledge editing. The code is available at https://github.com/GY-Jeong/STEAM.

STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models

TL;DR

STEAM tackles the problem of integrating updated facts into a language model's internal knowledge rather than merely changing output likelihood. It introduces Latent Positioning and Latent-Level Alignment to create semantic anchors from reference knowledge and steer edited representations toward these anchors via a latent alignment loss, formalized as with based on cosine distance to across mid-layers. Empirically, STEAM reduces the semantic isolation of edits, yielding improved Portability and consistent reasoning across GPT-J, Qwen2, and Llama3, both in single edits and batch editing scenarios (e.g., batch gains up to +2.2 in Portability). The approach is validated by latent-space visualizations showing edited residual streams more aligned with reference knowledge and by layer-wise cosine analyses that corroborate semantic integration. This semantic-level editing framework improves reliability and coherence of updated knowledge, enabling more robust long-term knowledge management in LLMs.

Abstract

Large Language Models store extensive factual knowledge acquired during large-scale pre-training. However, this knowledge is inherently static, reflecting only the state of the world at the time of training. Knowledge editing has emerged as a promising solution for updating outdated or incorrect facts without full retraining. However, most existing locate-and-edit methods primarily focus on token-level likelihood optimization without addressing semantic coherence. Our analysis reveals that such edited knowledge is often encoded as isolated residual streams in the model's latent space, distinct from pre-existing knowledge and bypassing natural reasoning process. To address this, we propose \textsc{Steam}, a semantic-level knowledge editing framework that enhances integration of updated knowledge into the model's knowledge structure. \textsc{Steam} first identifies target representations as semantic anchors for the updated factual association, then guides the internal representation of the edited fact towards these anchors through an alignment loss during optimization. Experimental results demonstrate that \textsc{Steam} improves model's ability to reason with edited knowledge and enhances semantic coherence, underscoring the importance of latent-space alignment for reliable and coherent knowledge editing. The code is available at https://github.com/GY-Jeong/STEAM.

Paper Structure

This paper contains 25 sections, 10 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: 3D visualization of residual stream across layers. Each subplot shows PCA-projected hidden states for a sample from the CounterFact dataset (index in brackets). Red diamonds represent the residual stream of the edited knowledge, while green circles denote that of the reference knowledge. (a) shows the result after applying ROME, and (b) shows the result with Steam$_\text{ROME}$.
  • Figure 2: Layer-wise probability of generating the updated object $o^*$ in edited model $\mathcal{F'}$ as computed by LogitLens. The $x$-axis indicates the transformer layer; $y$-axis shows the predicted probability for $o^*$. Results are averaged over all samples in CounterFact.
  • Figure 3: Overview of the Steam framework. (a) Relevant reference knowledge about the new target object $o^*$ (b) The model is used to verify and filter these facts; valid references are then used to construct the semantic anchor $\varphi$ that approximates the latent representation of $o^*$. (c) During editing, Steam introduces a latent-level alignment loss $\mathcal{L}_\text{LA}$, which guides the edited value vector $v^*$ to align with the semantic anchor across mid-layers, encouraging coherent integration of the new knowledge into the model’s latent space.
  • Figure 4: Layer-wise cosine similarity between model representations and semantic anchors in GPT-J. Each plot shows the average cosine similarity between anchor vectors $\varphi^\ell$ and hidden states from the edited model $h_{\varepsilon}^\ell$ (red) and the unedited model $h_\theta^\ell$ (blue), with shaded areas indicating standard deviation. The vertical dashed line marks the edit layer. (a) Result with ROME. (b) Result with Steam$_\text{ROME}$.
  • Figure 5: 3D visualization of residual stream representations across layers for GPT-J. Each subplot shows PCA-projected hidden states of an edited fact (red diamonds) and its corresponding reference facts (green circles), with sample indices indicated in brackets. (a) shows results under ROME, and (b) under Steam$_\text{ROME}$.
  • ...and 1 more figures