Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines
Edward Milsom, Ben Anson, Laurence Aitchison
TL;DR
This work addresses the gap between kernel methods and neural networks in complex vision tasks by introducing two enhancements to convolutional deep kernel machines: stochastic kernel regularisation (SKR) and the use of lower-precision arithmetic with Taylor KL approximations for numerical stability. SKR injects stochasticity into the inducing Gram matrices via Wishart sampling, while the Taylor-based KL terms stabilize training under TF32, enabling substantial speedups. The approach achieves 94.52% test accuracy on CIFAR-10, closely matching Adam-trained CNNs and surpassing prior deep kernel machines, demonstrating that representation learning can substantially boost kernel methods. The results highlight the practical viability of deep kernel representations and motivate further theoretical work to narrow remaining gaps with state-of-the-art neural networks.
Abstract
Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10 using a ResNet-inspired architecture, which is SOTA for kernel methods. However, this still lags behind neural networks, which easily achieve over 94% test accuracy with similar architectures. In this work we introduce several modifications to improve the convolutional deep kernel machine's generalisation, including stochastic kernel regularisation, which adds noise to the learned Gram matrices during training. The resulting model achieves 94.5% test accuracy on CIFAR-10. This finding has important theoretical and practical implications, as it demonstrates that the ability to perform well on complex tasks like image classification is not unique to neural networks. Instead, other approaches including deep kernel methods can achieve excellent performance on such tasks, as long as they have the capacity to learn representations from data.
