Table of Contents
Fetching ...

Designing a Linearized Potential Function in Neural Network Optimization Using Csiszár Type of Tsallis Entropy

Keito Akiyama

TL;DR

This paper establishes a framework that utilizes a linearized potential function via Csisz\'{a}r type of Tsallis entropy, which is one of the generalized entropies, and shows that this framework enable us to derive an exponential convergence result.

Abstract

In recent years, learning for neural networks can be viewed as optimization in the space of probability measures. To obtain the exponential convergence to the optimizer, the regularizing term based on Shannon entropy plays an important role. Even though an entropy function heavily affects convergence results, there is almost no result on its generalization, because of the following two technical difficulties: one is the lack of sufficient condition for generalized logarithmic Sobolev inequality, and the other is the distributional dependence of the potential function within the gradient flow equation. In this paper, we establish a framework that utilizes a linearized potential function via Csiszár type of Tsallis entropy, which is one of the generalized entropies. We also show that our new framework enable us to derive an exponential convergence result.

Designing a Linearized Potential Function in Neural Network Optimization Using Csiszár Type of Tsallis Entropy

TL;DR

This paper establishes a framework that utilizes a linearized potential function via Csisz\'{a}r type of Tsallis entropy, which is one of the generalized entropies, and shows that this framework enable us to derive an exponential convergence result.

Abstract

In recent years, learning for neural networks can be viewed as optimization in the space of probability measures. To obtain the exponential convergence to the optimizer, the regularizing term based on Shannon entropy plays an important role. Even though an entropy function heavily affects convergence results, there is almost no result on its generalization, because of the following two technical difficulties: one is the lack of sufficient condition for generalized logarithmic Sobolev inequality, and the other is the distributional dependence of the potential function within the gradient flow equation. In this paper, we establish a framework that utilizes a linearized potential function via Csiszár type of Tsallis entropy, which is one of the generalized entropies. We also show that our new framework enable us to derive an exponential convergence result.

Paper Structure

This paper contains 20 sections, 11 theorems, 63 equations, 1 table.

Key Result

Proposition 2.3

If $w_0 \in H^1(\gamma) \cap L^{\infty}(\gamma)$ holds that $w_0^- \equiv 0$ ($\gamma$-a.e.) and $\|w_0\|_{L^1(\gamma)} = 1$, then, there exists an unique $C^1([0, \infty); H^1(\gamma))$-solution of eq:KFP$w$. Moreover, the following properties are satisfied: for all $t \in [0, \infty)$,

Theorems & Definitions (14)

  • Definition 1.1: A Neural Network
  • Definition 1.2: Target functional
  • Definition 2.2: $C^1([0, \infty); H^1(\gamma))$-solution of \ref{['eq:KFP']}
  • Proposition 2.3
  • Theorem 2.4: Main Theorem
  • Lemma 3.1: Boundedness of the generalization error term
  • Lemma 3.2: Finiteness of $\gamma$
  • Lemma 3.3: Decomposition property of convex function
  • Lemma 3.4: Dunford--Pettis Theorem B2011
  • Lemma 3.5: Duality formula for $\mathcal{E}_{\varphi, \lambda, \tau}$ AGS2008
  • ...and 4 more