Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

Dimitris Bertsimas; Caio de Prospero Iglesias; Nicholas A. G. Johnson

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

Dimitris Bertsimas, Caio de Prospero Iglesias, Nicholas A. G. Johnson

TL;DR

This work addresses the sparse multiple kernel learning problem by enforcing an explicit cardinality constraint on kernel weights and adding an $\ell_2$ penalty for robustness. It introduces an alternating best response algorithm that separately optimizes the SVM dual (alpha) and a sparse kernel-weight update (beta), and it provides an exact mixed-integer semidefinite reformulation plus a hierarchy of SDP and sparse SDP relaxations to certify near-optimality. Empirical results on ten UCI binary classification tasks show that the random-initialization variant of the algorithm achieves superior out-of-sample accuracy and sparsity compared to Open-Source MKL baselines, with warm-start SDP relaxations further reducing optimality gaps where feasible. The methodology scales to larger kernel pools and offers practical certificates of optimality, making sparse, interpretable kernel learning more robust and efficient for real-world applications.

Abstract

We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing l1 regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an l2 penalty for robustness. We solve the resulting non-convex minimax problem via an alternating best response algorithm with two subproblems: the alpha subproblem is a standard kernel SVM dual solved via LIBSVM, while the beta subproblem admits an efficient solution via the Greedy Selector and Simplex Projector algorithm. We reformulate SMKL as a mixed integer semidefinite optimization problem and derive a hierarchy of semidefinite convex relaxations which can be used to certify near-optimality of the solutions returned by our best response algorithm and also to warm start it. On ten UCI benchmarks, our method with random initialization outperforms state-of-the-art MKL approaches in out-of-sample prediction accuracy on average by 3.34 percentage points (relative to the best performing benchmark) while selecting a small number of candidate kernels in comparable runtime. With warm starting, our method outperforms the best performing benchmark's out-of-sample prediction accuracy on average by 4.05 percentage points. Our convex relaxations provide a certificate that in several cases, the solution returned by our best response algorithm is the globally optimal solution.

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

TL;DR

Abstract

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (6)