Table of Contents
Fetching ...

Composable Interventions for Language Models

Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen

TL;DR

This work investigates how post-training interventions for language models interact when composed, addressing a practical need as multiple updates accumulate. It introduces composable interventions and two metrics, the Order-free Error $E_{OF}$ and Order Sensitivity $E_{OS}$, within a unified framework and codebase to evaluate cross-method interactions across Knowledge Editing, Machine Unlearning, and Model Compression. Empirically, it uncovers meaningful interactions: compression often degrades other interventions, the sequence of application matters, and general-purpose utility metrics fail to capture composability. The authors advocate for developing multi-objective composable interventions and provide a public resource to accelerate future research in online LM updates.

Abstract

Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: https://github.com/hartvigsen-group/composable-interventions.

Composable Interventions for Language Models

TL;DR

This work investigates how post-training interventions for language models interact when composed, addressing a practical need as multiple updates accumulate. It introduces composable interventions and two metrics, the Order-free Error and Order Sensitivity , within a unified framework and codebase to evaluate cross-method interactions across Knowledge Editing, Machine Unlearning, and Model Compression. Empirically, it uncovers meaningful interactions: compression often degrades other interventions, the sequence of application matters, and general-purpose utility metrics fail to capture composability. The authors advocate for developing multi-objective composable interventions and provide a public resource to accelerate future research in online LM updates.

Abstract

Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: https://github.com/hartvigsen-group/composable-interventions.
Paper Structure (41 sections, 4 equations, 4 figures, 21 tables)

This paper contains 41 sections, 4 equations, 4 figures, 21 tables.

Figures (4)

  • Figure 1: Interventions aim to update targeted properties of language models without impacting unrelated behaviors or adding excessive compute. We introduce and extensively experiment with composability of different interventions used on the same model.
  • Figure 2: Composing Knowledge Editing with Model Compression with varying degrees of compression. Higher values are better for all metrics. Key Takeaways: Editing post-compression generally outperforms the reverse order, as compression tends to degrade edit performance and all editors exhibit order sensitivity.
  • Figure 3: Composing Machine Unlearning with Model Compression at different levels of compression. Key Takeaways: Unlearning should generally be applied before compression, and performance varies significantly by composition.
  • Figure 4: Comparison of Composition Metrics across different models. The performance of each model is represented by a bar. The bottom of the bar represents the Order-free Error and the size of the bar represents the Order Sensitivity. Key Takeaways: Composability generalizes overall across models, with Edit Generalization having an expected model-dependent effect.