Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

Yahong Yang; Qipin Chen; Wenrui Hao

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

Yahong Yang, Qipin Chen, Wenrui Hao

TL;DR

An in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates and the potential for other activation functions and deep neural networks.

Abstract

In this paper, we present a novel training approach called the Homotopy Relaxation Training Algorithm (HRTA), aimed at accelerating the training process in contrast to traditional methods. Our algorithm incorporates two key mechanisms: one involves building a homotopy activation function that seamlessly connects the linear activation function with the ReLU activation function; the other technique entails relaxing the homotopy parameter to enhance the training refinement process. We have conducted an in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates. Our experimental results, especially when considering networks with larger widths, validate the theoretical conclusions. This proposed HRTA exhibits the potential for other activation functions and deep neural networks.

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

TL;DR

Abstract

Paper Structure (15 sections, 12 theorems, 77 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 12 theorems, 77 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Homotopy Relaxation Training Algorithm
Convergence Analysis
Preliminaries
Neural networks
Gradient descent kernel
Convergence of $t_1$ iteration
Convergence of $t_2$ iteration
Convergence of HRTA
Experimental Results for the Homotopy Relaxation Training Algorithm
Function approximation by HRTA
Solving partial differential equation by HRTA
Conclusion
Sub-exponential Bernstein's Inequality
Function approximation using supervised learning

Key Result

theorem 1

Suppose Assumption positive holds, denote Then we have $\lambda_{\boldsymbol{\omega},p+1}\ge\lambda_{\boldsymbol{\omega},p}>0,~\lambda_{a,p+1}\ge\lambda_{a,p}>0$ for all $0\le s_p\le s_{p+1}$.

Figures (5)

Figure 1: Structure of proof of Theorem \ref{['convergence']}
Figure 2: Approximation for $\sin(2\pi x)$
Figure 3: Approximation for $\sin(2\pi(x_1+x_2+x_3))$
Figure 4: Loss function in Deep Ritz method
Figure 5: Solving Eq. (\ref{['possion']}) measured by $L^2$ norm

Theorems & Definitions (23)

remark thmcounterremark
remark thmcounterremark
theorem 1
lemma thmcounterlemma: Weyl’s Inequalities
proof : Proof of Theorem \ref{['large']}
lemma thmcounterlemma: bounds of initial parameters luo2021phase
proposition thmcounterproposition: luo2021phase
proposition thmcounterproposition: luo2021phase
proposition thmcounterproposition: luo2021phase
lemma thmcounterlemma
...and 13 more

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

TL;DR

Abstract

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (23)