Random feature-based double Vovk-Azoury-Warmuth algorithm for online multi-kernel learning
Dmitry B. Rokhlin, Olga V. Gurtovaya
TL;DR
This work introduces VAW$^2$, a two-level online multi-kernel learning algorithm for least squares regression in RKHS that combines random Fourier feature approximations with the Vovk-Azoury-Warmuth framework. By constructing kernel-specific experts from random features and aggregating them at a meta-level (also via VAW or EWA), the method achieves sublinear regret with respect to the kernelized function class, specifically $O(T^{1/2}\ln T)$ when the number of random features grows as $m \sim \sqrt{T}$. The paper proves regret bounds for both a concatenated-feature approach and a scalable two-level scheme, and demonstrates superior empirical performance over baselines like Raker and OMKL-GF on multiple datasets. The approach balances computational efficiency with theoretical guarantees, making online MKL more practical for large-scale applications. Potential extensions include dynamic regret analysis and online kernel dictionary adaptation. All mathematical expressions are presented with appropriate delimiters.
Abstract
We introduce a novel multi-kernel learning algorithm, VAW$^2$, for online least squares regression in reproducing kernel Hilbert spaces (RKHS). VAW$^2$ leverages random Fourier feature-based functional approximation and the Vovk-Azoury-Warmuth (VAW) method in a two-level procedure: VAW is used to construct expert strategies from random features generated for each kernel at the first level, and then again to combine their predictions at the second level. A theoretical analysis yields a regret bound of $O(T^{1/2}\ln T)$ in expectation with respect to artificial randomness, when the number of random features scales as $T^{1/2}$. Empirical results on some benchmark datasets demonstrate that VAW$^2$ achieves superior performance compared to the existing online multi-kernel learning algorithms: Raker and OMKL-GF, and to other theoretically grounded method methods involving convex combination of expert predictions at the second level.
