Table of Contents
Fetching ...

Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs

Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Masahiro Ikeda

TL;DR

This work derives a new Rademacher complexity bound for deep neural networks using Koopman operators, group representations, and reproducing kernel Hilbert spaces (RKHSs) to derive a bound for a wider range of realistic models.

Abstract

We derive a new Rademacher complexity bound for deep neural networks using Koopman operators, group representations, and reproducing kernel Hilbert spaces (RKHSs). The proposed bound describes why the models with high-rank weight matrices generalize well. Although there are existing bounds that attempt to describe this phenomenon, these existing bounds can be applied to limited types of models. We introduce an algebraic representation of neural networks and a kernel function to construct an RKHS to derive a bound for a wider range of realistic models. This work paves the way for the Koopman-based theory for Rademacher complexity bounds to be valid for more practical situations.

Why High-rank Neural Networks Generalize?: An Algebraic Framework with RKHSs

TL;DR

This work derives a new Rademacher complexity bound for deep neural networks using Koopman operators, group representations, and reproducing kernel Hilbert spaces (RKHSs) to derive a bound for a wider range of realistic models.

Abstract

We derive a new Rademacher complexity bound for deep neural networks using Koopman operators, group representations, and reproducing kernel Hilbert spaces (RKHSs). The proposed bound describes why the models with high-rank weight matrices generalize well. Although there are existing bounds that attempt to describe this phenomenon, these existing bounds can be applied to limited types of models. We introduce an algebraic representation of neural networks and a kernel function to construct an RKHS to derive a bound for a wider range of realistic models. This work paves the way for the Koopman-based theory for Rademacher complexity bounds to be valid for more practical situations.

Paper Structure

This paper contains 29 sections, 15 theorems, 38 equations, 4 figures, 2 tables.

Key Result

Lemma 2

Assume $\sigma:\tilde{\mathcal{X}}\to\mathcal{X}$ is bijective, $\sigma^{-1}$ is differentiable, and the Jacobian of $\sigma^{-1}$ is bounded in $\mathcal{X}$. Then, we have $\Vert K_{\sigma}\Vert\le \sup_{x\in {\mathcal{X}}} \vert J\sigma^{-1}(x)\vert ^{1/2}$, where $J\sigma^{-1}$ is the Jacobian o

Figures (4)

  • Figure 1: Summary of the framework of the existing and proposed Koopman-based bounds
  • Figure 2: Construction of $\mathcal{X}_l$ and $\tilde{X}_l$
  • Figure 3: Construction of $\mathcal{X}_l$, $\mathcal{Y}_l$, $\mathcal{Z}_l$, and $\tilde{X}_l$
  • Figure 4: (a) Scatter plot of the generalization error versus our bound (for 3 independent runs). The color is set to get dark as the epoch proceeds. (b) Test accuracy with the regularization based on our bound and that based on the existing bound (deep neural net with dense layers). (c) Test accuracy with and without the regularization based on our bound (LeNet).

Theorems & Definitions (27)

  • Definition 1: Koopman operator and weighted Koopman operator
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Definition 5: Positive definite kernel
  • Lemma 6: Schur's lemma
  • Lemma 7: von Neumann double commutant theorem
  • Example 1: Scaled neural network with invertible weights
  • Example 2: Deep model with new structures
  • Example 3
  • ...and 17 more