Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling
Valery Parfenov, Grigoriy Evseev, Andrey Veprikov, Nikolay Bushkov, Stanislav Moiseev, Aleksandr Beznosikov
TL;DR
Memory-efficient fine-tuning of large language models is hindered by high memory costs of backpropagation. The paper introduces Learnable Direction Sampling Descent (LDSD), which treats the mean of the perturbation distribution as a learnable policy to align zero-order directional derivatives with the true gradient. Theoretical results show that gradient alignment increases over iterations and convergence bounds can be made dimension-free, while the approach remains a plug-in for existing ZO optimizers. Empirical results on SST-2 with RoBERTa-Large and OPT-1.3B show consistent improvements over standard ZO baselines, supporting the practical viability of adaptive direction sampling for scalable zero-order fine-tuning.
Abstract
Fine-tuning large pretrained language models (LLMs) is a cornerstone of modern NLP, yet its growing memory demands (driven by backpropagation and large optimizer States) limit deployment in resource-constrained settings. Zero-order (ZO) methods bypass backpropagation by estimating directional derivatives from forward evaluations, offering substantial memory savings. However, classical ZO estimators suffer from high variance and an adverse dependence on the parameter dimensionality $d$, which has constrained their use to low-dimensional problems. In this work, we propose a policy-driven ZO framework that treats the sampling distribution over perturbation directions as a learnable policy and updates it to reduce the variance of directional estimates. We develop a practical algorithm implementing this idea and provide a theoretical analysis, showing that learned sampling distributions improve the quality of gradient information and relax the explicit dependence on $d$ in convergence bounds. Empirically, we validate the approach on challenging LLM fine-tuning benchmarks, demonstrating substantially improved performance compared to standard ZO baselines. Our results suggest that adaptive direction sampling is a promising route to make ZO fine-tuning viable at scale. The source code is available at https://github.com/brain-lab-research/zo_ldsd
