When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li, Yihua Zhang, Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen
TL;DR
This paper introduces a theoretical framework for task vector-based model editing in nonlinear Transformers, establishing when arithmetic on task vectors yields reliable generalization across tasks, unlearning, and out-of-domain scenarios. It proves that careful selection of linear coefficients, guided by task correlations, enables multi-task learning and unlearning, and that task vectors can generalize to new tasks under mild conditions. The authors also show that practical techniques like low-rank approximation and sparsity pruning preserve these guarantees, and validate the theory with experiments on Colored-MNIST and language generation tasks using Phi-1.5. The work advances understanding of why task vectors work in nonlinear models and provides a foundation for efficient, robust PEFT-style edits with provable guarantees.
Abstract
Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors, each of which is the weight update from the pre-trained model to fine-tuned models for certain tasks. This approach recently gained attention as a computationally efficient inference method for model editing, e.g., multi-task learning, forgetting, and out-of-domain generalization capabilities. However, the theoretical understanding of why task vectors can execute various conceptual operations remains limited, due to the highly non-convexity of training Transformer-based models. To the best of our knowledge, this paper provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. We consider a conceptual learning setting, where each task is a binary classification problem based on a discriminative pattern. We theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or aligned tasks, as well as the success of task negation in unlearning one task from irrelevant or contradictory tasks. Moreover, we prove the proper selection of linear coefficients for task arithmetic to achieve guaranteed generalization to out-of-domain tasks. All of our theoretical results hold for both dense-weight parameters and their low-rank approximations. Although established in a conceptual setting, our theoretical findings were validated on a practical machine unlearning task using the large language model Phi-1.5 (1.3B).
