Table of Contents
Fetching ...

QuIC: Quantum-Inspired Compound Adapters for Parameter Efficient Fine-Tuning

Snehal Raj, Brian Coyle

TL;DR

QuIC Adapters offer a groundbreaking approach to parameter-efficient finetuning by combining orthogonal constraints with quantum-inspired compound matrices to achieve extreme memory compression ($<$ $0.02\%$ of the base model) while maintaining competitive performance across language, vision, math, and reasoning tasks. By structuring adapters as block-diagonal compounds up to a maximum Hamming-weight $K$, and enforcing orthogonality, QuIC unifies and extends orthogonal finetuning with combinatorial compression, including first-order reductions that recover OFT and higher-order configurations that provide substantial parameter reductions with modest performance trade-offs. Ablation studies reveal the essential roles of orthogonality and combinatorial determinants in preserving expressiveness under compression. The results demonstrate strong Pareto-efficient performance on GLUE, VTAB, and math/reasoning benchmarks, suggesting practical applicability in highly resource-constrained environments and potential quantum-native deployment pathways.

Abstract

Scaling full finetuning of large foundation models strains GPU memory and training time. Parameter Efficient Fine-Tuning (PEFT) methods address this issue via adapter modules which update only a small subset of model parameters. In this work, we introduce Quantum-Inspired Compound Adapters (QuIC Adapters), a PEFT approach inspired from Hamming-weight preserving quantum circuits that can effectively finetune a model using less than 0.02\% memory footprint of the base model. QuIC adapters preserve pretrained representations by enforcing orthogonality in weight parameters, and have native deployment mechanisms on quantum computers. We test QuIC adapters by finetuning large language models like LLaMA and vision transformers on language, math, reasoning and vision benchmarks. In its first-order configuration, QuIC recovers the performance of existing orthogonal methods, while higher-order configurations enable substantial parameter compression (over 40x smaller than LoRA) for a modest performance trade-off, unlocking applications in highly resource-constrained environments. Through ablation studies, we determine that combining multiple Hamming-weight orders with orthogonality and matrix compounding are essential for performant finetuning. Our findings suggest that QuIC adapters offers a promising direction for efficient finetuning of foundation models in resource-constrained environments.

QuIC: Quantum-Inspired Compound Adapters for Parameter Efficient Fine-Tuning

TL;DR

QuIC Adapters offer a groundbreaking approach to parameter-efficient finetuning by combining orthogonal constraints with quantum-inspired compound matrices to achieve extreme memory compression ( of the base model) while maintaining competitive performance across language, vision, math, and reasoning tasks. By structuring adapters as block-diagonal compounds up to a maximum Hamming-weight , and enforcing orthogonality, QuIC unifies and extends orthogonal finetuning with combinatorial compression, including first-order reductions that recover OFT and higher-order configurations that provide substantial parameter reductions with modest performance trade-offs. Ablation studies reveal the essential roles of orthogonality and combinatorial determinants in preserving expressiveness under compression. The results demonstrate strong Pareto-efficient performance on GLUE, VTAB, and math/reasoning benchmarks, suggesting practical applicability in highly resource-constrained environments and potential quantum-native deployment pathways.

Abstract

Scaling full finetuning of large foundation models strains GPU memory and training time. Parameter Efficient Fine-Tuning (PEFT) methods address this issue via adapter modules which update only a small subset of model parameters. In this work, we introduce Quantum-Inspired Compound Adapters (QuIC Adapters), a PEFT approach inspired from Hamming-weight preserving quantum circuits that can effectively finetune a model using less than 0.02\% memory footprint of the base model. QuIC adapters preserve pretrained representations by enforcing orthogonality in weight parameters, and have native deployment mechanisms on quantum computers. We test QuIC adapters by finetuning large language models like LLaMA and vision transformers on language, math, reasoning and vision benchmarks. In its first-order configuration, QuIC recovers the performance of existing orthogonal methods, while higher-order configurations enable substantial parameter compression (over 40x smaller than LoRA) for a modest performance trade-off, unlocking applications in highly resource-constrained environments. Through ablation studies, we determine that combining multiple Hamming-weight orders with orthogonality and matrix compounding are essential for performant finetuning. Our findings suggest that QuIC adapters offers a promising direction for efficient finetuning of foundation models in resource-constrained environments.

Paper Structure

This paper contains 48 sections, 5 theorems, 15 equations, 12 figures, 9 tables.

Key Result

Lemma 1

If a base matrix, $A \in \mathbb{R}^{n \times n}$ is orthogonal, then all compound matrices, $A^{(k)}$ with $k \in [n]$, are orthogonal (and hence all QuIC Adapters). Furthermore, this orthogonality is preserved during finetuning when constructed with Hamming-weight preserving operations.

Figures (12)

  • Figure 1: Comparison of different adapter methods. Trainable parameters for each model shown in dark green. a) Full finetuning b) Low-rank adaptation (LoRA) c) Orthogonal finetuning (OFT) d) Quantum-Inspired Compound adapter (QuIC adapter). For QuIC adapters, the zeroth order compound (top left of each block) is the only trainable part. Higher order compounds are completely determined by this base matrix.
  • Figure 2: Hamming-weight preserving quantum computation. Quantum circuits are read left to right and each vertical line corresponds to a Reconfigurable/Fermionic Beam Splitter (RBS/FBS) quantum gate with parameter $\theta$. a) A unary (parallel) data loader Landman2022quantummethods to load a vector, $\mathbf{x}$, into Hamming-weight (HW) $k=1$ states. Generalizations of such loaders to higher HW can be found farias2024quantum and discussed in the Appendix. b) A 'pyramid' trainable quantum circuit layer, which is HW preserving Landman2022quantummethods. c) The generalization into HW up to $K=3$ states. The action of a HW preserving layer composed of FBS gates is represented by a unitary, $U$, composed of compound matrices, $\{\mathcal{C}_k := A^{(k)}\}$ acting on a data encoded state, $\ket{\psi}$. The elements of the vector representation of $\ket{\psi}$ are ordered according to Hamming-weight, and the compound matrices, $\mathcal{C}_k$, act separately on each set of HW grouped basis states. The matrices, $U$, themselves will serve as the inspiration for our QuIC Adapters.
  • Figure 3: Different possible QuIC Adapter configurations. The adapter decomposition is determined by the number of blocks ($b$, or equivalently the 'rank' $r := d/b$), and the number of compounds within each block. Trailing dimensions are padded with an identity matrix, and are not trainable. The figure shows a) $\mathcal{C}_1$, $b=4$ blocks, b) $\mathcal{C}_1\oplus\mathcal{C}_2$, $b=4$ blocks and c) $\mathcal{C}_1\oplus \mathcal{C}_2\oplus\mathcal{C}_3$, $b=2$ blocks. Note, if the base matrix, $A$, is orthogonal then configuration (a) recovers OFT exactly.
  • Figure 4: Performance analysis of QuIC and baseline PEFT methods on GLUE benchmark
  • Figure 5: A Unary loader. Vertical lines denote parameterized RBS gates. Figure from Cherrat2023quantumdeephedging. The input is $\ket{0}^{\otimes n}$ and the output is the loaded state in unary, $\ket{\boldsymbol{x}} = \frac{1}{||\boldsymbol{x}||} \sum_{i} x_i \ket{e_i}$, when read from left to right.
  • ...and 7 more figures

Theorems & Definitions (7)

  • Lemma 1: Orthogonality preservation of compound matrices
  • Lemma 2: Parameter Count of QuIC Adapters
  • Lemma 3: Computational Complexity of QuIC Adapters
  • Lemma : Orthogonality preservation of compound matrices (Lemma \ref{['lemma:ortho_compound']} repeated)
  • proof
  • Lemma : Computational Complexity of QuIC Adapters (Lemma \ref{['lemma:computational_complexity']} repeated)
  • proof