QuIC: Quantum-Inspired Compound Adapters for Parameter Efficient Fine-Tuning
Snehal Raj, Brian Coyle
TL;DR
QuIC Adapters offer a groundbreaking approach to parameter-efficient finetuning by combining orthogonal constraints with quantum-inspired compound matrices to achieve extreme memory compression ($<$ $0.02\%$ of the base model) while maintaining competitive performance across language, vision, math, and reasoning tasks. By structuring adapters as block-diagonal compounds up to a maximum Hamming-weight $K$, and enforcing orthogonality, QuIC unifies and extends orthogonal finetuning with combinatorial compression, including first-order reductions that recover OFT and higher-order configurations that provide substantial parameter reductions with modest performance trade-offs. Ablation studies reveal the essential roles of orthogonality and combinatorial determinants in preserving expressiveness under compression. The results demonstrate strong Pareto-efficient performance on GLUE, VTAB, and math/reasoning benchmarks, suggesting practical applicability in highly resource-constrained environments and potential quantum-native deployment pathways.
Abstract
Scaling full finetuning of large foundation models strains GPU memory and training time. Parameter Efficient Fine-Tuning (PEFT) methods address this issue via adapter modules which update only a small subset of model parameters. In this work, we introduce Quantum-Inspired Compound Adapters (QuIC Adapters), a PEFT approach inspired from Hamming-weight preserving quantum circuits that can effectively finetune a model using less than 0.02\% memory footprint of the base model. QuIC adapters preserve pretrained representations by enforcing orthogonality in weight parameters, and have native deployment mechanisms on quantum computers. We test QuIC adapters by finetuning large language models like LLaMA and vision transformers on language, math, reasoning and vision benchmarks. In its first-order configuration, QuIC recovers the performance of existing orthogonal methods, while higher-order configurations enable substantial parameter compression (over 40x smaller than LoRA) for a modest performance trade-off, unlocking applications in highly resource-constrained environments. Through ablation studies, we determine that combining multiple Hamming-weight orders with orthogonality and matrix compounding are essential for performant finetuning. Our findings suggest that QuIC adapters offers a promising direction for efficient finetuning of foundation models in resource-constrained environments.
