Table of Contents
Fetching ...

An Application of the Holonomic Gradient Method to the Neural Tangent Kernel

Akihiro Sakoda, Nobuki Takayama

TL;DR

Methods to numerically evaluate dual activations of holonomic activator distributions for neural tangent kernels for neural tangent kernels are given based on computer algebra algorithms for rings of differential operators.

Abstract

A holonomic system of linear partial differential equations is, roughly speaking, a system whose solution space is finite dimensional. A distribution that is a solution of a holonomic system is called a holonomic distribution. We give methods to numerically evaluate dual activations of holonomic activator distributions for neural tangent kernels. These methods are based on computer algebra algorithms for rings of differential operators.

An Application of the Holonomic Gradient Method to the Neural Tangent Kernel

TL;DR

Methods to numerically evaluate dual activations of holonomic activator distributions for neural tangent kernels for neural tangent kernels are given based on computer algebra algorithms for rings of differential operators.

Abstract

A holonomic system of linear partial differential equations is, roughly speaking, a system whose solution space is finite dimensional. A distribution that is a solution of a holonomic system is called a holonomic distribution. We give methods to numerically evaluate dual activations of holonomic activator distributions for neural tangent kernels. These methods are based on computer algebra algorithms for rings of differential operators.

Paper Structure

This paper contains 30 sections, 13 theorems, 65 equations, 5 figures, 3 algorithms.

Key Result

Theorem 1

arora-2019 Fix $\epsilon > 0$ and $\delta \in (0,1)$. Suppose $\sigma(z) = \max(0,z)$ and $\min_{h\in{[L]}}d_{h} \geq \Omega(\frac{L^6}{\epsilon^4} \log(L/\delta))$. Then for any inputs $x,x^\prime \in {\bf R}^{d_0}$ such that $\|x\| \leq 1,\|x^\prime\| \leq 1$, with probability at least $1-\delta$

Figures (5)

  • Figure 1: Inference by ReLU
  • Figure 2: Inference by GeLU
  • Figure 3: Inference by ReSin
  • Figure 4: Integration pathes of an ODE solver.
  • Figure 5: Timing when the parameter "step" increases.

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Proposition 1
  • Theorem 6
  • Theorem 7
  • Example 1
  • Theorem 8
  • ...and 5 more