CompAct: Compressed Activations for Memory-Efficient LLM Training
Yara Shamshoum, Nitzan Hodos, Yuval Sieradzki, Assaf Schuster
TL;DR
CompAct introduces activation compression via random projections to store compressed linear activations for backward passes, achieving substantial peak memory reductions with minimal performance loss. By operating in a reduced subspace for gradient updates and decompressing only for parameter updates, it shifts memory savings to the dominant compute-graph component, with reported reductions of about 25-30% during pretraining and up to 50% during finetuning, scalable with model size. The method relies on Gaussian random projections, per-layer seeding (or shared seeds), and an update cadence that balances accuracy and speed, and it demonstrates strong memory–throughput tradeoffs on LLaMA pretraining and RoBERTa finetuning. The work outlines practical extensions, including sparse projections and integration with activation checkpointing and other memory-saving strategies, offering a viable path to train larger models within fixed hardware budgets.
Abstract
We introduce CompAct, a technique that reduces peak memory utilization on GPU by 25-30% for pretraining and 50% for fine-tuning of LLMs. Peak device memory is a major limiting factor in training LLMs, with various recent works aiming to reduce model memory. However most works don't target the largest component of allocated memory during training: the model's compute graph, which is stored for the backward pass. By storing low-rank, compressed activations to be used in the backward pass we greatly reduce the required memory, unlike previous methods which only reduce optimizer overheads or the number of trained parameters. Our compression uses random projection matrices, thus avoiding additional memory overheads. Comparisons with previous techniques for either pretraining or fine-tuning show that CompAct substantially improves existing compute-performance tradeoffs. We expect CompAct's savings to scale even higher for larger models.
