Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA
Sangyoon Lee, Jaeho Lee
TL;DR
This paper tackles the mismatch and contradictions in reported gains for LoRA variants by revealing batch size as a major confound in evaluations. Through a unified experimental framework that varies batch size, learning rate, and protocol, the authors show that vanilla LoRA can match or beat PiSSA and MiLoRA when batch size is optimized, reconciling prior claims. They dissect how the optimal batch size interacts with LoRA rank, dataset scale, and base-model capacity, and propose a low-cost proxy using small-scale, low-rank configurations on the full dataset to identify transferable batch-size settings. The work offers practical guidance for robust evaluation and efficient deployment of LoRA-based fine-tuning in resource-constrained environments, highlighting that the optimal batch size is not universally small and that careful tuning is essential for credible comparisons.
Abstract
Low-rank adaptation (LoRA) is a standard approach for fine-tuning large language models, yet its many variants report conflicting empirical gains, often on the same benchmarks. We show that these contradictions arise from a single overlooked factor: the batch size. When properly tuned, vanilla LoRA often matches the performance of more complex variants. We further propose a proxy-based, cost-efficient strategy for batch size tuning, revealing the impact of rank, dataset size, and model capacity on the optimal batch size. Our findings elevate batch size from a minor implementation detail to a first-order design parameter, reconciling prior inconsistencies and enabling more reliable evaluations of LoRA variants.
