Spectrum: Targeted Training on Signal to Noise Ratio

Eric Hartford; Lucas Atkins; Fernando Fernandes Neto; David Golchinfar

Spectrum: Targeted Training on Signal to Noise Ratio

Eric Hartford, Lucas Atkins, Fernando Fernandes Neto, David Golchinfar

TL;DR

Spectrum introduces a principled, SNR-based approach to post-training large language models by leveraging Random Matrix Theory. It computes per-layer SNRs using SVD and the Marchenko-Pastur threshold to identify and train only the most informative layers while freezing others, achieving competitive or superior model quality with reduced VRAM usage, especially in distributed settings. Compared to full fine-tuning and QLoRA, Spectrum delivers notable memory and time savings, with Spectrum-50 and Spectrum-25 often matching or surpassing baselines in benchmark performance. The method offers practical impact for cost-effective LLM adaptation and scales to very large models, with public code and future work exploring adaptive scheduling and broader modality applications.

Abstract

Efficiently post-training large language models remains a challenging task due to the vast computational resources required. We present Spectrum, a method that accelerates LLM training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. Our approach, which utilizes an algorithm to compute module SNRs prior to training, has shown to effectively match the performance of full fine-tuning while reducing GPU memory usage. Experiments comparing Spectrum to existing methods such as QLoRA demonstrate its effectiveness in terms of model quality and VRAM efficiency in distributed environments.

Spectrum: Targeted Training on Signal to Noise Ratio

TL;DR

Abstract

Paper Structure (21 sections, 7 equations, 5 figures, 3 tables)

This paper contains 21 sections, 7 equations, 5 figures, 3 tables.

Introduction
Related Work
Mathematical Foundation
Illustrating Overfitting's Impact on Singular Values
Random Matrix Theory (RMT) Perspective
Benefits of Focusing on Matrices with Larger Singular Values
Relating Eigenvalues and Singular Values
Marchenko-Pastur Distribution
Signal-to-Noise Ratio and Matrix Ranking
Measuring The Signal-to-Noise Ratio
Layer Selection
Evaluations
Setup
Benchmark Scores
Memory Usage & Training Time
...and 6 more sections

Figures (5)

Figure 1:
Figure 2:
Figure 3:
Figure 4:
Figure :

Spectrum: Targeted Training on Signal to Noise Ratio

TL;DR

Abstract

Spectrum: Targeted Training on Signal to Noise Ratio

Authors

TL;DR

Abstract

Table of Contents

Figures (5)