TensLoRA: Tensor Alternatives for Low-Rank Adaptation
Axel Marmoret, Reda Bensaid, Jonathan Lys, Vincent Gripon, François Leduc-Primeau
TL;DR
TensLoRA generalizes Low-Rank Adaptation (LoRA) by aggregating all trainable updates into higher-order tensors and applying Tucker factorization, enabling mode-specific compression across dimensions such as attention heads, depth, and QKV components. The method introduces seven tensor extensions (e.g., Att, QKV, Depth and their combinations) to capture cross-dimension redundancy, with per-mode ranks allowing flexible budget allocation. Experimental results on vision and language benchmarks show that certain tensor constructions (notably QKV_Depth and Att_QKV_Depth) can outperform standard LoRA at similar parameter counts, though gains are not uniform at very high compression, highlighting non-uniform redundancy across model dimensions. Overall, TensLoRA offers a scalable framework for exploring tensor-based adapters and opens avenues for improved interpretability and future extensions with alternative factorizations.
Abstract
Low-Rank Adaptation (LoRA) is widely used to efficiently adapt Transformers by adding trainable low-rank matrices to attention projections. While effective, these matrices are considered independent for each attention projection (Query, Key, and Value) and each layer. Recent extensions have considered joint, tensor-based adaptations, but only in limited forms and without a systematic framework. We introduce TensLoRA, a unified framework that aggregates LoRA updates into higher-order tensors and models a broad family of tensor-based low-rank adaptations. Our formulation generalizes existing tensor-based methods and enables mode-specific compression rates, allowing parameter budgets to be tailored according to the modality and task. Experiments on vision and language benchmarks reveal that the tensor construction directly impacts performance, sometimes better than standard LoRA under similar parameter counts.
