Table of Contents
Fetching ...

Neural Kernels Without Tangents

Vaishaal Shankar, Alex Fang, Wenshuo Guo, Sara Fridovich-Keil, Ludwig Schmidt, Jonathan Ragan-Kelley, Benjamin Recht

TL;DR

This work builds exact, depth-consistent compositional kernels from bags of features and connects them to neural network architectures, particularly neural tangent kernels, to assess practical expressivity. By composing three primitive operations—concatenation, downsampling, and embedding—the authors derive kernels that can be computed directly from pixel data, enabling deep kernel evaluations with convolution-like structure. Empirical results show that these compositional kernels can outperform NTKs on CIFAR-10 and excel in small-data regimes, while simple Myrtle-style neural networks achieve higher performance when augmented; preprocessing (ZCA) also plays a crucial role for kernels. Overall, the paper demonstrates that well-structured kernel constructions can closely track neural-network performance in several settings, suggesting a viable path for practical, domain-specific kernel design grounded in neural-network-inspired hierarchies.

Abstract

We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating "compositional" kernels from bags of features. We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)". Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.

Neural Kernels Without Tangents

TL;DR

This work builds exact, depth-consistent compositional kernels from bags of features and connects them to neural network architectures, particularly neural tangent kernels, to assess practical expressivity. By composing three primitive operations—concatenation, downsampling, and embedding—the authors derive kernels that can be computed directly from pixel data, enabling deep kernel evaluations with convolution-like structure. Empirical results show that these compositional kernels can outperform NTKs on CIFAR-10 and excel in small-data regimes, while simple Myrtle-style neural networks achieve higher performance when augmented; preprocessing (ZCA) also plays a crucial role for kernels. Overall, the paper demonstrates that well-structured kernel constructions can closely track neural-network performance in several settings, suggesting a viable path for practical, domain-specific kernel design grounded in neural-network-inspired hierarchies.

Abstract

We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating "compositional" kernels from bags of features. We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)". Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.

Paper Structure

This paper contains 41 sections, 31 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of the ReLU (arccosine) and Gaussian kernels ($\gamma = 1$), as a function of the angle $\vartheta$ between two examples.
  • Figure 2: A 5 layer network from the "Myrtle" family (Myrtle5).
  • Figure 3: Accuracy results on random subsets of CIFAR-10, with standard deviations over 20 trials. The 14-layer CNTK results are from arora2020harnessing.
  • Figure 4: Performance profiles for NTK and tuned Gaussian kernel on 90 UCI datasets.
  • Figure 5: a) 7 layer b) 10 layer variants of the Myrtle architectures