Table of Contents
Fetching ...

Algorithms for Learning Kernels Based on Centered Alignment

Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

TL;DR

These algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression.

Abstract

This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression.

Algorithms for Learning Kernels Based on Centered Alignment

TL;DR

These algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression.

Abstract

This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression.

Paper Structure

This paper contains 23 sections, 18 theorems, 116 equations, 3 figures, 5 tables.

Key Result

Lemma 1

Let ${\mathbf 1} \in \mathbb{R}^{m \times 1}$ denote the vector with all entries equal to one, and ${\mathbf I}$ the identity matrix.

Figures (3)

  • Figure 1: (a) Representation of the distribution $D$. In this simple two-dimensional example, a fraction $\alpha$ of the points are at $(-1,0)$ and have the label $-1$. The remaining points are at $(1,0)$ and have the label $+1$. (b) Alignment values computed for two different definitions of alignment. The solid line in black plots the definition of alignment computed according to align-nips$A = (\alpha^2 + (1 - \alpha)^2)^{1/2}$, while our definition of centered alignment results in the straight dotted blue line $\rho = 1$.
  • Figure 2: Detailed view of the splice and kinematics experiments presented in Table \ref{['table:centering']}. Both the centered (plots in blue) and non-centered alignment (plots in orange) are plotted as a function of the accuracy (for the regression problem in the kinematics task "accuracy" is 1 - RMSE). It is apparent from these plots that the non-centered alignment can be misleading when evaluating the quality of a kernel.
  • Figure 3: A scatter plot comparison of the different kernel combination weight values obtained by optimally tuned one-stage and two-stage algorithms on the kinematics dataset.

Theorems & Definitions (21)

  • Lemma 1
  • Definition 2: Kernel function alignment
  • Lemma 3
  • Definition 4: Kernel matrix alignment
  • Definition 5: Unnormalized alignment
  • Lemma 6
  • Proposition 7
  • Proposition 8
  • Proposition 9
  • Proposition 10
  • ...and 11 more