LaX: Boosting Low-Rank Training of Foundation Models via Latent Crossing
Ruijie Zhang, Ziyue Liu, Zhengyang Wang, Zheng Zhang
TL;DR
LaX introduces Latent Crossing, a lightweight plug-in that enables information flow across low-rank subspaces to restore expressiveness without increasing rank. The method uses residual connections and configurable gates to adapt to different low-rank structures (e.g., SVD, CoLA, TT) and remains compatible with LoRA for efficient fine-tuning. Empirical results across ViT and LLaMA-like models show LaX closes much of the gap between low-rank and full-rank baselines, with gains in pretraining accuracy and perplexity and improvements on arithmetic and commonsense reasoning during fine-tuning. The work offers a practical, generalizable approach to training efficient foundation models, reducing compute while preserving or enhancing performance.
Abstract
Training foundation models such as ViTs and LLMs requires tremendous computing cost. Low-rank matrix or tensor factorization offers a parameter-efficient alternative, but often downgrades performance due to the restricted parameter space. In this work, we introduce {\textbf{Latent Crossing (LaX)}} -- a simple yet effective plug-and-play module that enhances the capacity of low-rank models by enabling information flow across low-rank subspaces. We extensively validate the benefits of LaX on pre-training tasks with ViT-Base/Large and LLaMA-like models ranging from 60M to 1B parameters. LaX boosts low-rank model performance to match or exceed the full-rank baselines while using 2-3\(\times\) fewer parameters. When equipped with low-rank adapters (i.e., LoRA) for fine-tuning LLaMA-7/13B, LaX consistently improves performance on arithmetic and common sense reasoning tasks with negligible cost.
