Table of Contents
Fetching ...

Convection-Diffusion Equation: A Theoretically Certified Framework for Neural Networks

Tangjun Wang, Chenglong Bao, Zuoqiang Shi

TL;DR

This paper studies the partial differential equation (PDE) model of neural networks, and theoretically proves that this mapping can be formulated by a convection-diffusion equation, under interpretable and intuitive assumptions from both neural network and PDE perspectives.

Abstract

In this paper, we study the partial differential equation models of neural networks. Neural network can be viewed as a map from a simple base model to a complicate function. Based on solid analysis, we show that this map can be formulated by a convection-diffusion equation. This theoretically certified framework gives mathematical foundation and more understanding of neural networks. Moreover, based on the convection-diffusion equation model, we design a novel network structure, which incorporates diffusion mechanism into network architecture. Extensive experiments on both benchmark datasets and real-world applications validate the performance of the proposed model.

Convection-Diffusion Equation: A Theoretically Certified Framework for Neural Networks

TL;DR

This paper studies the partial differential equation (PDE) model of neural networks, and theoretically proves that this mapping can be formulated by a convection-diffusion equation, under interpretable and intuitive assumptions from both neural network and PDE perspectives.

Abstract

In this paper, we study the partial differential equation models of neural networks. Neural network can be viewed as a map from a simple base model to a complicate function. Based on solid analysis, we show that this map can be formulated by a convection-diffusion equation. This theoretically certified framework gives mathematical foundation and more understanding of neural networks. Moreover, based on the convection-diffusion equation model, we design a novel network structure, which incorporates diffusion mechanism into network architecture. Extensive experiments on both benchmark datasets and real-world applications validate the performance of the proposed model.
Paper Structure (42 sections, 1 theorem, 66 equations, 7 figures, 8 tables)

This paper contains 42 sections, 1 theorem, 66 equations, 7 figures, 8 tables.

Key Result

Theorem 1

Under the above assumptions, there exists Lipschitz continuous function $v:\mathbb{R}^d\times [0,T]\to \mathbb{R}^d$ and Lipschitz continuous positive function $\sigma:\mathbb{R}^d\times [0,T]\to \mathbb R^{d\times d}$ such that for any bounded and uniformly continuous base classifier $f({\bm{x}})$, where ${\bm{x}}\in \mathbb R^{d}, t\in [0,T]$. Here $\sigma_{i,j}$ is the $i,j$-th element of matri

Figures (7)

  • Figure 1: $\mathcal{T}_t$ represents the mapping from $u({\mspace{2mu}\cdot\mspace{2mu}},0)$ to $u({\mspace{2mu}\cdot\mspace{2mu}},t)$. Top block describes the evolution from a coarse image to a fine image in scale-space theory. Bottom block describes the evolution from a base classifier to a neural network.
  • Figure 2: Accuracy boxplots of COIN with different diffusion strength. x-axis represents $\sigma^2$, y-axis represents accuracy(%). The orange solid line represents median. The green dashed line represents mean. The lower and upper hinges correspond to the 1st and 3rd quartiles, the whisker corresponds to the minimum or maximum values no further than 1.5 $\times$ inter-quartile range from the hinge. Data beyond the end of the whiskers are outlying points that are plotted individually.
  • Figure 3: Time complexity on Citeseer dataset. The x-axis represents average time (seconds) per training epoch, and the y-axis represents average convergence time (seconds) per task, both in log-scale. The color bar measures the average accuracy of each method.
  • Figure 4: Space complexity shown in bar plot. The y-axis represents the allocated GPU memory (MB) in log-scale.
  • Figure 5: Time complexity on Cora dataset. The x-axis represents average time (seconds) per training epoch, and the y-axis represents average convergence time (seconds) per task, both in log-scale. The color bar measures the average accuracy of each method.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof