Tangent Transformers for Composition, Privacy and Removal
Tian Yu Liu, Aditya Golatkar, Stefano Soatto
TL;DR
TAFT introduces Tangent Transformers by applying a first-order Taylor expansion around a pre-trained initialization, yielding a linear in $\Delta w$ representation: $f^{lin}_w(\cdot)=f_w(\cdot)+\nabla_w f_w(\cdot)\cdot\Delta w$. This enables efficient Jacobian-Vector Products in a single forward pass, making training and inference costs comparable to traditional non-linear Transformers while preserving the same parameter count. The paper demonstrates practical benefits in parallel training, model composition, zero-cost forgetting, and differential privacy, with TAFT achieving near-parity with NLFT on many downstream tasks and offering large speedups in shard-based workflows. Overall, Tangent Transformers provide a scalable, private, and composable alternative to full fine-tuning for large Vision Transformer models, leveraging convex optimization in weight space to enable new forms of model management.
Abstract
We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning linearized transformers obtained by computing a First-order Taylor Expansion around a pre-trained initialization. We show that the Jacobian-Vector Product resulting from linearization can be computed efficiently in a single forward pass, reducing training and inference cost to the same order of magnitude as its original non-linear counterpart, while using the same number of parameters. Furthermore, we show that, when applied to various downstream visual classification tasks, the resulting Tangent Transformer fine-tuned with TAFT can perform comparably with fine-tuning the original non-linear network. Since Tangent Transformers are linear with respect to the new set of weights, and the resulting fine-tuning loss is convex, we show that TAFT enjoys several advantages compared to non-linear fine-tuning when it comes to model composition, parallel training, machine unlearning, and differential privacy. Our code is available at: https://github.com/tianyu139/tangent-model-composition
