Table of Contents
Fetching ...

Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines

Edward Milsom, Ben Anson, Laurence Aitchison

TL;DR

This work addresses the gap between kernel methods and neural networks in complex vision tasks by introducing two enhancements to convolutional deep kernel machines: stochastic kernel regularisation (SKR) and the use of lower-precision arithmetic with Taylor KL approximations for numerical stability. SKR injects stochasticity into the inducing Gram matrices via Wishart sampling, while the Taylor-based KL terms stabilize training under TF32, enabling substantial speedups. The approach achieves 94.52% test accuracy on CIFAR-10, closely matching Adam-trained CNNs and surpassing prior deep kernel machines, demonstrating that representation learning can substantially boost kernel methods. The results highlight the practical viability of deep kernel representations and motivate further theoretical work to narrow remaining gaps with state-of-the-art neural networks.

Abstract

Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10 using a ResNet-inspired architecture, which is SOTA for kernel methods. However, this still lags behind neural networks, which easily achieve over 94% test accuracy with similar architectures. In this work we introduce several modifications to improve the convolutional deep kernel machine's generalisation, including stochastic kernel regularisation, which adds noise to the learned Gram matrices during training. The resulting model achieves 94.5% test accuracy on CIFAR-10. This finding has important theoretical and practical implications, as it demonstrates that the ability to perform well on complex tasks like image classification is not unique to neural networks. Instead, other approaches including deep kernel methods can achieve excellent performance on such tasks, as long as they have the capacity to learn representations from data.

Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines

TL;DR

This work addresses the gap between kernel methods and neural networks in complex vision tasks by introducing two enhancements to convolutional deep kernel machines: stochastic kernel regularisation (SKR) and the use of lower-precision arithmetic with Taylor KL approximations for numerical stability. SKR injects stochasticity into the inducing Gram matrices via Wishart sampling, while the Taylor-based KL terms stabilize training under TF32, enabling substantial speedups. The approach achieves 94.52% test accuracy on CIFAR-10, closely matching Adam-trained CNNs and surpassing prior deep kernel machines, demonstrating that representation learning can substantially boost kernel methods. The results highlight the practical viability of deep kernel representations and motivate further theoretical work to narrow remaining gaps with state-of-the-art neural networks.

Abstract

Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10 using a ResNet-inspired architecture, which is SOTA for kernel methods. However, this still lags behind neural networks, which easily achieve over 94% test accuracy with similar architectures. In this work we introduce several modifications to improve the convolutional deep kernel machine's generalisation, including stochastic kernel regularisation, which adds noise to the learned Gram matrices during training. The resulting model achieves 94.5% test accuracy on CIFAR-10. This finding has important theoretical and practical implications, as it demonstrates that the ability to perform well on complex tasks like image classification is not unique to neural networks. Instead, other approaches including deep kernel methods can achieve excellent performance on such tasks, as long as they have the capacity to learn representations from data.
Paper Structure (18 sections, 28 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 28 equations, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: Effects of different regularisation methods on Gram matrix condition number, in the toy binary classification problem trained for 2000 epochs. The left plot shows the condition numbers when different amounts of stochastic kernel regularisation ($\gamma$) are applied. The middle and right plots show the condition numbers when the coefficient $\nu$ of the KL regularisation terms are varied, with and without a Taylor approximation, respectively.
  • Figure 2: Effects of stochastic kernel regularisation on Gram matrix condition number strength, in the toy binary classification problem trained for 10000 epochs. See Section \ref{['sec:numerical_stability_investigations']}.