Table of Contents
Fetching ...

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

Yahong Yang, Qipin Chen, Wenrui Hao

TL;DR

An in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates and the potential for other activation functions and deep neural networks.

Abstract

In this paper, we present a novel training approach called the Homotopy Relaxation Training Algorithm (HRTA), aimed at accelerating the training process in contrast to traditional methods. Our algorithm incorporates two key mechanisms: one involves building a homotopy activation function that seamlessly connects the linear activation function with the ReLU activation function; the other technique entails relaxing the homotopy parameter to enhance the training refinement process. We have conducted an in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates. Our experimental results, especially when considering networks with larger widths, validate the theoretical conclusions. This proposed HRTA exhibits the potential for other activation functions and deep neural networks.

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

TL;DR

An in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates and the potential for other activation functions and deep neural networks.

Abstract

In this paper, we present a novel training approach called the Homotopy Relaxation Training Algorithm (HRTA), aimed at accelerating the training process in contrast to traditional methods. Our algorithm incorporates two key mechanisms: one involves building a homotopy activation function that seamlessly connects the linear activation function with the ReLU activation function; the other technique entails relaxing the homotopy parameter to enhance the training refinement process. We have conducted an in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates. Our experimental results, especially when considering networks with larger widths, validate the theoretical conclusions. This proposed HRTA exhibits the potential for other activation functions and deep neural networks.
Paper Structure (15 sections, 12 theorems, 77 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 12 theorems, 77 equations, 5 figures, 1 table, 1 algorithm.

Key Result

theorem 1

Suppose Assumption positive holds, denote Then we have $\lambda_{\boldsymbol{\omega},p+1}\ge\lambda_{\boldsymbol{\omega},p}>0,~\lambda_{a,p+1}\ge\lambda_{a,p}>0$ for all $0\le s_p\le s_{p+1}$.

Figures (5)

  • Figure 1: Structure of proof of Theorem \ref{['convergence']}
  • Figure 2: Approximation for $\sin(2\pi x)$
  • Figure 3: Approximation for $\sin(2\pi(x_1+x_2+x_3))$
  • Figure 4: Loss function in Deep Ritz method
  • Figure 5: Solving Eq. (\ref{['possion']}) measured by $L^2$ norm

Theorems & Definitions (23)

  • remark thmcounterremark
  • remark thmcounterremark
  • theorem 1
  • lemma thmcounterlemma: Weyl’s Inequalities
  • proof : Proof of Theorem \ref{['large']}
  • lemma thmcounterlemma: bounds of initial parameters luo2021phase
  • proposition thmcounterproposition: luo2021phase
  • proposition thmcounterproposition: luo2021phase
  • proposition thmcounterproposition: luo2021phase
  • lemma thmcounterlemma
  • ...and 13 more