CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Manit Baser; Alperen Yildiz; Dinil Mon Divakaran; Mohan Gurusamy

CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Manit Baser, Alperen Yildiz, Dinil Mon Divakaran, Mohan Gurusamy

Abstract

The static knowledge representations of large language models (LLMs) inevitably become outdated or incorrect over time. While model-editing techniques offer a promising solution by modifying a model's factual associations, they often produce unpredictable ripple effects, which are unintended behavioral changes that propagate even to the hidden space. In this work, we introduce CLaRE, a lightweight representation-level technique to identify where these ripple effects may occur. Unlike prior gradient-based methods, CLaRE quantifies entanglement between facts using forward activations from a single intermediate layer, avoiding costly backward passes. To enable systematic study, we prepare and analyse a corpus of 11,427 facts drawn from three existing datasets. Using CLaRE, we compute large-scale entanglement graphs of this corpus for multiple models, capturing how local edits propagate through representational space. These graphs enable stronger preservation sets for model editing, audit trails, efficient red-teaming, and scalable post-edit evaluation. In comparison to baselines, CLaRE achieves an average of 62.2% improvement in Spearman correlation with ripple effects while being $2.74\times$ faster, and using $2.85\times$ less peak GPU memory. Besides, CLaRE requires only a fraction of the storage needed by the baselines to compute and preserve fact representations. Our entanglement graphs and corpus are available at https://anonymous.4open.science/r/CLaRE-488E.

CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Abstract

faster, and using

less peak GPU memory. Besides, CLaRE requires only a fraction of the storage needed by the baselines to compute and preserve fact representations. Our entanglement graphs and corpus are available at https://anonymous.4open.science/r/CLaRE-488E.

Paper Structure (31 sections, 9 equations, 38 figures, 11 tables)

This paper contains 31 sections, 9 equations, 38 figures, 11 tables.

Introduction
Related Work
Preliminary
Ripple Effects in Model Editing
CLaRE: A lightweight and scalable technique for identifying ripple effects
Experiments
Performance on entanglement estimation and computational efficiency (RQ ①)
(1) $\ell_2$ logit shift
(2) Original answer log-probability shift
Layer-wise Correlation Analysis (RQ ②)
Performance on scalability and downstream applications support (RQ ③)
Conclusion
Limitations
Ethical considerations
Computational complexity analysis
...and 16 more sections

Figures (38)

Figure 1: A targeted update to a political fact may inadvertently alter the model's prediction for an unrelated musical fact, despite no semantic connection. This demonstrates how edits can trigger ripple effects far beyond the intended factual neighborhood.
Figure 2: For each fact, GradSim computes the entire gradient, while CLaRE uses a single forward pass up till the last critical layer, enabling faster and scalable entanglement mapping.
Figure 3: Correlation patterns for AlphaEdit: entanglement vs. $\ell_2$ logit shift (left) and $|\Delta \log P(y)|$ (right).
Figure 4: Performance comparison between CLaRE and GradSim in terms of Spearman correlation ($\rho_s$). The left panel shows $\rho_s$ between entanglement values and $\ell_2$ logit shift, and right panel shows $\rho_s$ between entanglement values and $|\Delta \log P(y)|$. CLaRE (wider, transparent bars) consistently achieves higher $\rho_s$ than GradSim (narrower, solid bars).
Figure 5: Computational efficiency comparison. Closer to center => better performance.
...and 33 more figures

CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Abstract

CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Authors

Abstract

Table of Contents

Figures (38)