LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models
Hossein Abdi, Mingfei Sun, Andi Zhang, Samuel Kaski, Wei Pan
TL;DR
LoKO reframes online fine-tuning of large models as a state-estimation problem and combines Low-Rank Adaptation (LoRA) with a diagonal covariance Kalman filter to achieve scalable online optimization. By reducing trainable parameters to $r(d+q)$ and maintaining a diagonal covariance, LoKO achieves linear-time complexity in the number of trainable parameters and uses an EMA-based scheme to estimate the observation noise covariance $R_k$. Empirical results across computer vision and language benchmarks show LoKO converges faster and attains higher online accuracy than standard LoRA-based optimizers, with robustness to initialization and covariance estimation choices. This work demonstrates the feasibility of Kalman-filter-based optimization for online fine-tuning of transformer- and CNN-based large models, offering a performant alternative to gradient-based methods in streaming data settings.
Abstract
Training large models with millions or even billions of parameters from scratch incurs substantial computational costs. Parameter Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), address this challenge by adapting only a reduced number of parameters to specific tasks with gradient-based optimizers. In this paper, we cast PEFT as an optimal filtering/state estimation problem and present Low-Rank Kalman Optimizer (LoKO) to estimate the optimal trainable parameters in an online manner. We leverage the low-rank decomposition in LoRA to significantly reduce matrix sizes in Kalman iterations and further capitalize on a diagonal approximation of the covariance matrix to effectively decrease computational complexity from quadratic to linear in the number of trainable parameters. Moreover, we discovered that the initialization of the covariance matrix within the Kalman algorithm and the accurate estimation of the observation noise covariance are the keys in this formulation, and we propose robust approaches that work well across a vast range of well-established computer vision and language models. Our results show that LoKO converges with fewer iterations and yields better performance models compared to commonly used optimizers with LoRA in both image classifications and language tasks. Our study opens up the possibility of leveraging the Kalman filter as an effective optimizer for the online fine-tuning of large models.
