Transformer-Squared: Self-adaptive LLMs
Qi Sun, Edoardo Cetin, Yujin Tang
TL;DR
The paper tackles the static nature and high cost of traditional fine-tuning by proposing Transformer^2, a self-adaptive LLM framework that builds a bank of domain-specific expert vectors through Singular Value Fine-tuning (SVF). SVF learns vector $z$ to modulate weight matrices via $W' = U \Sigma' V^\top$ with $\Sigma' = \Sigma \otimes \text{diag}(z)$, enabling compact, composable adaptations trained with RL and regularized by KL penalties. In inference, Transformer^2 employs a two-pass process and three adaptation strategies to compose experts for unseen prompts, achieving superior performance with far fewer parameters than LoRA and demonstrating cross-model transfer and vision-language versatility. The work demonstrates strong empirical results across diverse LLMs and tasks, proposing a scalable pathway for truly dynamic, self-organizing AI systems with practical implications for deployment efficiency and continual learning.
Abstract
Self-adaptive large language models (LLMs) aim to solve the challenges posed by traditional fine-tuning methods, which are often computationally intensive and static in their ability to handle diverse tasks. We introduce Transformer-Squared, a novel self-adaptation framework that adapts LLMs for unseen tasks in real-time by selectively adjusting only the singular components of their weight matrices. During inference, Transformer-Squared employs a two-pass mechanism: first, a dispatch system identifies the task properties, and then task-specific 'expert' vectors, trained using reinforcement learning, are dynamically mixed to obtain targeted behavior for the incoming prompt. Our method consistently outperforms ubiquitous approaches such as LoRA, with fewer parameters and greater efficiency. Furthermore, Transformer-Squared demonstrates versatility across different LLM architectures and modalities, including vision-language tasks. Transformer-Squared represents a significant leap forward, offering a scalable, efficient solution for enhancing the adaptability and task-specific performance of LLMs, paving the way for truly dynamic, self-organizing AI systems.
