Table of Contents
Fetching ...

Mirror Descent on Reproducing Kernel Banach Spaces

Akash Kumar, Mikhail Belkin, Parthe Pandit

TL;DR

This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS, and proposes an algorithm based on mirror descent (MDA) that employs gradient steps in the dual space of the Banach space using the reproducing kernel.

Abstract

Recent advances in machine learning have led to increased interest in reproducing kernel Banach spaces (RKBS) as a more general framework that extends beyond reproducing kernel Hilbert spaces (RKHS). These works have resulted in the formulation of representer theorems under several regularized learning schemes. However, little is known about an optimization method that encompasses these results in this setting. This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS. To tackle this challenge, we propose an algorithm based on mirror descent (MDA). Our approach involves an iterative method that employs gradient steps in the dual space of the Banach space using the reproducing kernel. We analyze the convergence properties of our algorithm under various assumptions and establish two types of results: first, we identify conditions under which a linear convergence rate is achievable, akin to optimization in the Euclidean setting, and provide a proof of the linear rate; second, we demonstrate a standard convergence rate in a constrained setting. Moreover, to instantiate this algorithm in practice, we introduce a novel family of RKBSs with $p$-norm ($p \neq 2$), characterized by both an explicit dual map and a kernel.

Mirror Descent on Reproducing Kernel Banach Spaces

TL;DR

This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS, and proposes an algorithm based on mirror descent (MDA) that employs gradient steps in the dual space of the Banach space using the reproducing kernel.

Abstract

Recent advances in machine learning have led to increased interest in reproducing kernel Banach spaces (RKBS) as a more general framework that extends beyond reproducing kernel Hilbert spaces (RKHS). These works have resulted in the formulation of representer theorems under several regularized learning schemes. However, little is known about an optimization method that encompasses these results in this setting. This paper addresses a learning problem on Banach spaces endowed with a reproducing kernel, focusing on efficient optimization within RKBS. To tackle this challenge, we propose an algorithm based on mirror descent (MDA). Our approach involves an iterative method that employs gradient steps in the dual space of the Banach space using the reproducing kernel. We analyze the convergence properties of our algorithm under various assumptions and establish two types of results: first, we identify conditions under which a linear convergence rate is achievable, akin to optimization in the Euclidean setting, and provide a proof of the linear rate; second, we demonstrate a standard convergence rate in a constrained setting. Moreover, to instantiate this algorithm in practice, we introduce a novel family of RKBSs with -norm (), characterized by both an explicit dual map and a kernel.

Paper Structure

This paper contains 28 sections, 17 theorems, 123 equations, 3 figures, 1 algorithm.

Key Result

Lemma 12

If a real-valued functional $\mathsf F :{\mathcal{D}} \subseteq {\mathcal{B}} \to \mathbb{R}$ is $\mu$-strongly convex and $\gamma$-smooth, then $\mu \le \gamma$.

Figures (3)

  • Figure 1: Functional MDA
  • Figure 2: Our MDA for RBKSs
  • Figure 5: Results from numerical experiments using the mirror descent algorithm (Algorithm \ref{['alg: md']}) for varying $p$ values in a one-dimensional space. We apply squared error loss on 800 training points with 25 centers using the Locally Adaptive-Bandwidths (LAB) RBF kernel. The bivariate function $H$ is defined by $\exp \left( -\frac{\| \theta_i \odot (x - {\mathbf{c}}_i) \|_2^2}{2} \right)$, with $\theta_i$ optimized via gradient descent. (Leftmost Plot): Logarithm of training error versus iterations. (Middle Plot): Zoomed-in log error versus iterations up to 100,000 steps. (Rightmost Plot): Predictions of learned kernel classifiers compared with the direct matrix solution (gray curve) on the training set.

Theorems & Definitions (36)

  • Definition 1: Reflexive Banach spaces
  • Definition 2: Reproducing kernel Banach spaces (RKBS)
  • Definition 3: Reproducing kernel
  • Definition 4: Gâteaux differential
  • Definition 5: Fréchet derivative
  • Definition 6: Convexity
  • Definition 7: Uniform convexity
  • Definition 8: Subgradients
  • Definition 9: $\mu$-strongly convex
  • Definition 10: $\gamma$-smoothness
  • ...and 26 more