SineLoRA$Δ$: Sine-Activated Delta Compression
Cameron Gordon, Yiping Ji, Hemanth Saratchandran, Paul Albert, Simon Lucey
TL;DR
The paper tackles Delta Compression for resource-constrained model adaptation by addressing the expressivity gap of quantized Low-Rank Adapters. It introduces SineLoRAΔ, a rank-enhancing approach that applies a fixed-frequency sinusoidal activation after quantization, supported by a theoretical bound showing that stable rank under quantization is governed by the unquantized adapter and can be boosted with sine activations. The authors formalize the framework with a Main Theorem on stable rank, complement it with BD analysis to compare rate-distortion performance, and validate the method across large language models, vision-language tasks, and text-to-image generation, achieving memory reductions up to 66% with competitive accuracy. The work demonstrates practical gains for bandwidth- and memory-constrained deployments and proposes a principled evaluation protocol for PEFT compression using Bjøntegaard Delta analysis, while highlighting avenues for quantization-aware training and inference optimizations.
Abstract
Resource-constrained weight deployment is a task of immense practical importance. Recently, there has been interest in the specific task of \textit{Delta Compression}, where parties each hold a common base model and only communicate compressed weight updates. However, popular parameter efficient updates such as Low Rank Adaptation (LoRA) face inherent representation limitations - which are especially pronounced when combined with aggressive quantization. To overcome this, we build on recent work that improves LoRA representation capacity by using fixed-frequency sinusoidal functions to increase stable rank without adding additional parameters. We extend this to the quantized setting and present the first theoretical analysis showing how stable rank evolves under quantization. From this, we introduce SineLoRA$Δ$, a principled and effective method for delta compression that improves the expressivity of quantized low-rank adapters by applying a sinusoidal activation. We validate SineLoRA$Δ$ across a diverse variety of domains - including language modeling, vision-language tasks, and text-to-image generation - achieving up to 66% memory reduction with similar performance. We additionally provide a novel application of the canonical Bjøntegaard Delta metric to consistently compare adapter compression changes across the rate-distortion curve.
