Table of Contents
Fetching ...

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljačić

TL;DR

QuanTA tackles the inefficiency of full fine-tuning and the expressivity limits of low-rank methods by introducing quantum-inspired tensor adaptations for high-rank updates. Grounded by a Universality theorem and a Rank Representation bound, QuanTA expresses weight updates as sequences of axis-wise two-axis tensors that can replicate arbitrary updates with far fewer trainable parameters and without adding inference cost. Empirically, QuanTA achieves competitive or superior performance on tasks demanding complex reasoning (e.g., DROP, commonsense, arithmetic) while maintaining parameter efficiency and compatibility with existing PEFT techniques. This approach offers a scalable, efficient pathway to fine-tune extremely large language models and highlights a fruitful intersection between quantum-inspired methods and practical NLP fine-tuning.

Abstract

We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for complicated downstream tasks. Our approach is theoretically supported by the universality theorem and the rank representation theorem to achieve efficient high-rank adaptations. Experiments demonstrate that QuanTA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods. Furthermore, QuanTA shows superior performance with fewer trainable parameters compared to other approaches and can be designed to integrate with existing fine-tuning algorithms for further improvement, providing a scalable and efficient solution for fine-tuning large language models and advancing state-of-the-art in natural language processing.

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

TL;DR

QuanTA tackles the inefficiency of full fine-tuning and the expressivity limits of low-rank methods by introducing quantum-inspired tensor adaptations for high-rank updates. Grounded by a Universality theorem and a Rank Representation bound, QuanTA expresses weight updates as sequences of axis-wise two-axis tensors that can replicate arbitrary updates with far fewer trainable parameters and without adding inference cost. Empirically, QuanTA achieves competitive or superior performance on tasks demanding complex reasoning (e.g., DROP, commonsense, arithmetic) while maintaining parameter efficiency and compatibility with existing PEFT techniques. This approach offers a scalable, efficient pathway to fine-tune extremely large language models and highlights a fruitful intersection between quantum-inspired methods and practical NLP fine-tuning.

Abstract

We propose Quantum-informed Tensor Adaptation (QuanTA), a novel, easy-to-implement, fine-tuning method with no inference overhead for large-scale pre-trained language models. By leveraging quantum-inspired methods derived from quantum circuit structures, QuanTA enables efficient high-rank fine-tuning, surpassing the limitations of Low-Rank Adaptation (LoRA)--low-rank approximation may fail for complicated downstream tasks. Our approach is theoretically supported by the universality theorem and the rank representation theorem to achieve efficient high-rank adaptations. Experiments demonstrate that QuanTA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods. Furthermore, QuanTA shows superior performance with fewer trainable parameters compared to other approaches and can be designed to integrate with existing fine-tuning algorithms for further improvement, providing a scalable and efficient solution for fine-tuning large language models and advancing state-of-the-art in natural language processing.
Paper Structure (5 sections, 7 equations, 3 figures, 1 table)

This paper contains 5 sections, 7 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Conceptual comparison of QuanTA and LoRA methods. LoRA parameterizes the weight matrix update as a outer product of two low-rank matrices, limiting its capacity. QuanTA, inspired by quantum circuits, uses tensors that operate on specific axes of the (reshaped) input, enabling high-rank parameterization. Supported by the universality theorem and rank representation theorem, QuanTA can represent arbitrary matrices effectively, allowing it to achieve performance comparable to or sometimes even better than full fine-tuning, with only a fraction of the parameters. Note: the performance graph is a conceptual illustration.
  • Figure 2: Subspace similarities between two LoRA experiments of different ranks (64 and 128) for two datasets. Each point $(i, j)$ represents the subspace similarity between the first $i$ right singular vectors of the $r=64$ experiment, and the first $j$ right singular vectors of the $r=128$ experiment. Only points for $i \le j$ are plotted. DROP dataset has a significantly high "intrinsic rank" than RTE dataset.
  • Figure 3: Any unitary matrix can be decomposed into a quantum circuit using one- and two-qubit gates.