MetaTT: A Global Tensor-Train Adapter for Parameter-Efficient Fine-Tuning
Javier Lopez-Piqueres, Pranav Deshpande, Archan Ray, Mattia J. Villani, Marco Pistoia, Niraj Kumar
TL;DR
MetaTT introduces a global transformer adapter built from Tensor Train (TT) decompositions to perform parameter-efficient fine-tuning across layers, matrix types, heads, and tasks. By factorizing all linear sub-modules into a single TT, MetaTT achieves substantial parameter compression, with counts scaling as the sum of TT modes rather than their product, and extends to multi-task learning by incorporating a dedicated task-mode core. The approach includes a DMRG-inspired rank-adaptive optimizer that progressively reduces TT ranks during training, improving optimization and generalization. Empirical results across single-task and multi-task benchmarks show MetaTT is competitive with or closer to LoRA performance while reducing trainable parameters by factors of 2–30×, and it benefits from rank-adaptive training, particularly for larger models. These findings suggest TT-based global adapters offer scalable, efficient fine-tuning for large language models in resource-constrained settings, with strong potential for extension to broader tensor-network architectures and training regimes.
Abstract
We present MetaTT, a Tensor Train (TT) adapter framework for fine-tuning of pre-trained transformers. MetaTT enables flexible and parameter-efficient model adaptation by using a single shared TT to factorize transformer sub-modules. This factorization indexes key structural dimensions, including layer and matrix type, and can optionally incorporate heads and tasks. This design allows MetaTT's parameter count to scale with the sum, rather than the product, of the modes, resulting in a substantially more compact adapter. Our benchmarks compare MetaTT with LoRA along with recent state-of-the-art matrix and tensor decomposition based fine-tuning methods. We observe that when tested on single-task standard language modeling benchmarks, MetaTT achieves competitive parameter efficiency to accuracy tradeoff. We further demonstrate that MetaTT performs competitively when compared to state-of-the-art methods on multi-task learning. Finally, we leverage the TT-ansatz to design a rank adaptive optimizer inspired by the DMRG method from many-body physics. Our results demonstrate that integrating this approach with AdamW enhances optimization performance for a specified target rank.
