Table of Contents
Fetching ...

Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning

Boumediene Hamzi, Marcus Hutter, Houman Owhadi

TL;DR

The paper investigates kernel learning from an Algorithmic Information Theory (AIT) perspective, bridging AIT with kernel methods via the Minimum Description Length (MDL) principle. It shows that Kernel Flows (KFs), especially Sparse Kernel Flows (SKF), perform kernel learning by compressing data, with the KF relative error acting as a log-likelihood ratio and enabling a probabilistic/GP interpretation. The authors argue that MDL-based regularization provides a more solid theoretical foundation than cross-validation for kernel learning and propose a pathway to reformulate ML algorithms within an AIT framework. They also suggest extending this methodology to a wider range of algorithms by leveraging data-compression concepts like MDL and covering numbers to guide model selection and data usage.

Abstract

Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This approach aligns naturally with the MDL principle, offering a more robust theoretical basis than the existing reliance on cross-validation. The study reveals that deriving Sparse Kernel Flows does not require a statistical approach; instead, one can directly engage with code-lengths and complexities, concepts central to AIT. Thereby, this approach opens the door to reformulating algorithms in machine learning using tools from AIT, with the aim of providing them a more solid theoretical foundation.

Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning

TL;DR

The paper investigates kernel learning from an Algorithmic Information Theory (AIT) perspective, bridging AIT with kernel methods via the Minimum Description Length (MDL) principle. It shows that Kernel Flows (KFs), especially Sparse Kernel Flows (SKF), perform kernel learning by compressing data, with the KF relative error acting as a log-likelihood ratio and enabling a probabilistic/GP interpretation. The authors argue that MDL-based regularization provides a more solid theoretical foundation than cross-validation for kernel learning and propose a pathway to reformulate ML algorithms within an AIT framework. They also suggest extending this methodology to a wider range of algorithms by leveraging data-compression concepts like MDL and covering numbers to guide model selection and data usage.

Abstract

Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This approach aligns naturally with the MDL principle, offering a more robust theoretical basis than the existing reliance on cross-validation. The study reveals that deriving Sparse Kernel Flows does not require a statistical approach; instead, one can directly engage with code-lengths and complexities, concepts central to AIT. Thereby, this approach opens the door to reformulating algorithms in machine learning using tools from AIT, with the aim of providing them a more solid theoretical foundation.
Paper Structure (14 sections, 3 theorems, 28 equations)

This paper contains 14 sections, 3 theorems, 28 equations.

Key Result

Proposition A.1

If $K$ is a reproducing kernel of a Hilbert space ${\mathcal{H}}$, then i. $K(x,y)$ is unique. ii. $\forall x,y \in \, {\mathcal{X}}$, $K(x,y)=K(y,x)$ (symmetry). iii. $\sum_{i,j=1}^q\beta_i\beta_jK(x_i,x_j) \ge 0$ for $\beta_i \in \, \mathbb{R}$, $x_i \in \, {\mathcal{X}}$ and $q\in \,\mathbb{N}_+$

Theorems & Definitions (4)

  • Definition A.1
  • Proposition A.1
  • Theorem A.1
  • Theorem A.2