STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models
Geunyeong Jeong, Juoh Sun, Seonghee Lee, Harksoo Kim
TL;DR
STEAM tackles the problem of integrating updated facts into a language model's internal knowledge rather than merely changing output likelihood. It introduces Latent Positioning and Latent-Level Alignment to create semantic anchors from reference knowledge and steer edited representations toward these anchors via a latent alignment loss, formalized as $\\mathcal{L}(\\delta)=\\mathcal{L}_{NLL}(\\delta)+\\mathcal{L}_{KL}(\\delta)+\\lambda\\L_{LA}(\\delta)$ with $\\L_{LA}$ based on cosine distance to $\\varphi^\ell$ across mid-layers. Empirically, STEAM reduces the semantic isolation of edits, yielding improved Portability and consistent reasoning across GPT-J, Qwen2, and Llama3, both in single edits and batch editing scenarios (e.g., batch gains up to +2.2 in Portability). The approach is validated by latent-space visualizations showing edited residual streams more aligned with reference knowledge and by layer-wise cosine analyses that corroborate semantic integration. This semantic-level editing framework improves reliability and coherence of updated knowledge, enabling more robust long-term knowledge management in LLMs.
Abstract
Large Language Models store extensive factual knowledge acquired during large-scale pre-training. However, this knowledge is inherently static, reflecting only the state of the world at the time of training. Knowledge editing has emerged as a promising solution for updating outdated or incorrect facts without full retraining. However, most existing locate-and-edit methods primarily focus on token-level likelihood optimization without addressing semantic coherence. Our analysis reveals that such edited knowledge is often encoded as isolated residual streams in the model's latent space, distinct from pre-existing knowledge and bypassing natural reasoning process. To address this, we propose \textsc{Steam}, a semantic-level knowledge editing framework that enhances integration of updated knowledge into the model's knowledge structure. \textsc{Steam} first identifies target representations as semantic anchors for the updated factual association, then guides the internal representation of the edited fact towards these anchors through an alignment loss during optimization. Experimental results demonstrate that \textsc{Steam} improves model's ability to reason with edited knowledge and enhances semantic coherence, underscoring the importance of latent-space alignment for reliable and coherent knowledge editing. The code is available at https://github.com/GY-Jeong/STEAM.
