Table of Contents
Fetching ...

CoSA: Compressed Sensing-Based Adaptation of Large Language Models

Songtao Wei, Yi Li, Bohan Zhang, Zhichun Guo, Ying Huang, Yuede Ji, Miao Yin, Guanpeng Li, Bingzhe Li

TL;DR

CoSA reframes PEFT as a compressed sensing synthesis problem by encoding weight updates with fixed random projections into a low-dimensional core: ΔW = L Y R, with vec(ΔW) = (R^T ⊗ L) vec(Y). The Kronecker-dictionary-based RIP guarantees ensure stable, near-isometric optimization while enabling expressive, multi-directional adaptation with far fewer trainable parameters than traditional low-rank schemes. Theoretical analysis establishes RIP for the Kronecker dictionary and empirical validation across NLP tasks shows CoSA matching or surpassing state-of-the-art PEFT methods on RoBERTa, LLaMA, and Qwen models with substantial memory savings. This offers a scalable, principled approach for efficient LLM adaptation, reducing resource demands without sacrificing performance. The work demonstrates that fixed random projections combined with a compact learnable core can robustly capture task-specific updates across diverse downstream tasks.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has emerged as a practical paradigm for adapting large language models (LLMs) without updating all parameters. Most existing approaches, such as LoRA and PiSSA, rely on low-rank decompositions of weight updates. However, the low-rank assumption may restrict expressivity, particularly in task-specific adaptation scenarios where singular values are distributed relatively uniformly. To address this limitation, we propose CoSA (Compressed Sensing-Based Adaptation), a new PEFT method extended from compressed sensing theory. Instead of constraining weight updates to a low-rank subspace, CoSA expresses them through fixed random projection matrices and a compact learnable core. We provide a formal theoretical analysis of CoSA as a synthesis process, proving that weight updates can be compactly encoded into a low-dimensional space and mapped back through random projections. Extensive experimental results show that CoSA provides a principled perspective for efficient and expressive multi-scale model adaptation. Specifically, we evaluate CoSA on 10 diverse tasks, including natural language understanding and generation, employing 5 models of different scales from RoBERTa, Llama, and Qwen families. Across these settings, CoSA consistently matches or outperforms state-of-the-art PEFT methods.

CoSA: Compressed Sensing-Based Adaptation of Large Language Models

TL;DR

CoSA reframes PEFT as a compressed sensing synthesis problem by encoding weight updates with fixed random projections into a low-dimensional core: ΔW = L Y R, with vec(ΔW) = (R^T ⊗ L) vec(Y). The Kronecker-dictionary-based RIP guarantees ensure stable, near-isometric optimization while enabling expressive, multi-directional adaptation with far fewer trainable parameters than traditional low-rank schemes. Theoretical analysis establishes RIP for the Kronecker dictionary and empirical validation across NLP tasks shows CoSA matching or surpassing state-of-the-art PEFT methods on RoBERTa, LLaMA, and Qwen models with substantial memory savings. This offers a scalable, principled approach for efficient LLM adaptation, reducing resource demands without sacrificing performance. The work demonstrates that fixed random projections combined with a compact learnable core can robustly capture task-specific updates across diverse downstream tasks.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has emerged as a practical paradigm for adapting large language models (LLMs) without updating all parameters. Most existing approaches, such as LoRA and PiSSA, rely on low-rank decompositions of weight updates. However, the low-rank assumption may restrict expressivity, particularly in task-specific adaptation scenarios where singular values are distributed relatively uniformly. To address this limitation, we propose CoSA (Compressed Sensing-Based Adaptation), a new PEFT method extended from compressed sensing theory. Instead of constraining weight updates to a low-rank subspace, CoSA expresses them through fixed random projection matrices and a compact learnable core. We provide a formal theoretical analysis of CoSA as a synthesis process, proving that weight updates can be compactly encoded into a low-dimensional space and mapped back through random projections. Extensive experimental results show that CoSA provides a principled perspective for efficient and expressive multi-scale model adaptation. Specifically, we evaluate CoSA on 10 diverse tasks, including natural language understanding and generation, employing 5 models of different scales from RoBERTa, Llama, and Qwen families. Across these settings, CoSA consistently matches or outperforms state-of-the-art PEFT methods.
Paper Structure (44 sections, 3 theorems, 31 equations, 4 figures, 8 tables, 1 algorithm)

This paper contains 44 sections, 3 theorems, 31 equations, 4 figures, 8 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\bm\Psi_1 \in \mathbb{R}^{a\times m}$ and $\bm\Psi_2 \in \mathbb{R}^{b\times n}$ be independent random matrices that satisfy the RIP for appropriate sparsity classes. Then their Kronecker product, $\bm\Psi = \bm\Psi_1^T \otimes \bm\Psi_2 \in \mathbb{R}^{mn\times ab}$, satisfies the RIP with hig

Figures (4)

  • Figure 1: Comparison of LoRA and CoSA. Fixed and trainable modules are denoted by blue and red, respectively. LoRA constrains updates to a low-rank subspace via matrices $A$ and $B$, while CoSA reinterprets updates as a compressed sensing process with fixed projections $L,R$ and a compact trainable core $Y$.
  • Figure 2: Performance across compression pairs $(a,b)$. Blue: $a>b$, red: $a<b$, green diagonal: $a=b$. $\blacktriangle$/$\blacktriangledown$ mark configurations that outperform/underperform their symmetric counterparts. Color intensity reflects score magnitude.
  • Figure 3: Comparison of parameter and memory efficiency. (a) Trainable parameter count. (b) Memory footprint (including optimizer states). (c) CoSA parameters relative to LoRA (1B: Llama-3.2-1B; 7B: Qwen2-7B; 8B: Llama-3.1-8B).
  • Figure 4: Empirical validation of RIP properties for CoSA compression across four configurations and three sparsity levels ($s = 5, 10, 20$).

Theorems & Definitions (3)

  • Theorem 4.1: RIP of Kronecker Product Dictionaries
  • Lemma 1.1: Sparse Vector Covering vershynin2018high
  • Theorem 1.2: Empirical RIP Convergence tucker1959generalization