Table of Contents
Fetching ...

Random feature-based double Vovk-Azoury-Warmuth algorithm for online multi-kernel learning

Dmitry B. Rokhlin, Olga V. Gurtovaya

TL;DR

This work introduces VAW$^2$, a two-level online multi-kernel learning algorithm for least squares regression in RKHS that combines random Fourier feature approximations with the Vovk-Azoury-Warmuth framework. By constructing kernel-specific experts from random features and aggregating them at a meta-level (also via VAW or EWA), the method achieves sublinear regret with respect to the kernelized function class, specifically $O(T^{1/2}\ln T)$ when the number of random features grows as $m \sim \sqrt{T}$. The paper proves regret bounds for both a concatenated-feature approach and a scalable two-level scheme, and demonstrates superior empirical performance over baselines like Raker and OMKL-GF on multiple datasets. The approach balances computational efficiency with theoretical guarantees, making online MKL more practical for large-scale applications. Potential extensions include dynamic regret analysis and online kernel dictionary adaptation. All mathematical expressions are presented with appropriate delimiters.

Abstract

We introduce a novel multi-kernel learning algorithm, VAW$^2$, for online least squares regression in reproducing kernel Hilbert spaces (RKHS). VAW$^2$ leverages random Fourier feature-based functional approximation and the Vovk-Azoury-Warmuth (VAW) method in a two-level procedure: VAW is used to construct expert strategies from random features generated for each kernel at the first level, and then again to combine their predictions at the second level. A theoretical analysis yields a regret bound of $O(T^{1/2}\ln T)$ in expectation with respect to artificial randomness, when the number of random features scales as $T^{1/2}$. Empirical results on some benchmark datasets demonstrate that VAW$^2$ achieves superior performance compared to the existing online multi-kernel learning algorithms: Raker and OMKL-GF, and to other theoretically grounded method methods involving convex combination of expert predictions at the second level.

Random feature-based double Vovk-Azoury-Warmuth algorithm for online multi-kernel learning

TL;DR

This work introduces VAW, a two-level online multi-kernel learning algorithm for least squares regression in RKHS that combines random Fourier feature approximations with the Vovk-Azoury-Warmuth framework. By constructing kernel-specific experts from random features and aggregating them at a meta-level (also via VAW or EWA), the method achieves sublinear regret with respect to the kernelized function class, specifically when the number of random features grows as . The paper proves regret bounds for both a concatenated-feature approach and a scalable two-level scheme, and demonstrates superior empirical performance over baselines like Raker and OMKL-GF on multiple datasets. The approach balances computational efficiency with theoretical guarantees, making online MKL more practical for large-scale applications. Potential extensions include dynamic regret analysis and online kernel dictionary adaptation. All mathematical expressions are presented with appropriate delimiters.

Abstract

We introduce a novel multi-kernel learning algorithm, VAW, for online least squares regression in reproducing kernel Hilbert spaces (RKHS). VAW leverages random Fourier feature-based functional approximation and the Vovk-Azoury-Warmuth (VAW) method in a two-level procedure: VAW is used to construct expert strategies from random features generated for each kernel at the first level, and then again to combine their predictions at the second level. A theoretical analysis yields a regret bound of in expectation with respect to artificial randomness, when the number of random features scales as . Empirical results on some benchmark datasets demonstrate that VAW achieves superior performance compared to the existing online multi-kernel learning algorithms: Raker and OMKL-GF, and to other theoretically grounded method methods involving convex combination of expert predictions at the second level.

Paper Structure

This paper contains 5 sections, 5 theorems, 61 equations, 2 figures, 2 tables.

Key Result

Lemma 1

For any $f=\int\gamma(\theta)\phi(\cdot,\theta)\,d\theta\in\mathcal{H}$ put where $\theta_i\sim p$ are i.i.d. random variables. Then

Figures (2)

  • Figure 1: MSE performance of MKL algorithms.
  • Figure 2: Final weights of VAW$^2$, VAW-EWA and VAW-ML-Prod algorithms.

Theorems & Definitions (10)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • proof