A Novel Convolutional Neural Network Architecture with a Continuous Symmetry

Yao Liu; Hang Shao; Bing Bai

A Novel Convolutional Neural Network Architecture with a Continuous Symmetry

Yao Liu, Hang Shao, Bing Bai

TL;DR

This work introduces a Convolutional Neural Network architecture inspired by quasi-linear hyperbolic PDEs, enabling a continuous symmetry in the weight space via transformations from a Lie group such as $GL(n,\mathbb{R})$. By replacing standard per-activation nonlinearities with a nonlinear coupling across branches and employing variable-coefficient convolutions, the model can mix channels and, in some configurations, remove most activations without sacrificing performance. Experimental results on a 100-class ImageNet subset show competitive accuracy (up to $84.96\%$ top-1) with modest parameter counts, using a ResNet50 backbone and activation-placing strategies that mitigate training instabilities. The paper argues that incorporating PDE perspectives can yield novel architectural designs and deeper interpretations of ConvNets, with potential extensions to other architectures like Transformers.

Abstract

This paper introduces a new Convolutional Neural Network (ConvNet) architecture inspired by a class of partial differential equations (PDEs) called quasi-linear hyperbolic systems. With comparable performance on the image classification task, it allows for the modification of the weights via a continuous group of symmetry. This is a significant shift from traditional models where the architecture and weights are essentially fixed. We wish to promote the (internal) symmetry as a new desirable property for a neural network, and to draw attention to the PDE perspective in analyzing and interpreting ConvNets in the broader Deep Learning community.

A Novel Convolutional Neural Network Architecture with a Continuous Symmetry

TL;DR

. By replacing standard per-activation nonlinearities with a nonlinear coupling across branches and employing variable-coefficient convolutions, the model can mix channels and, in some configurations, remove most activations without sacrificing performance. Experimental results on a 100-class ImageNet subset show competitive accuracy (up to

top-1) with modest parameter counts, using a ResNet50 backbone and activation-placing strategies that mitigate training instabilities. The paper argues that incorporating PDE perspectives can yield novel architectural designs and deeper interpretations of ConvNets, with potential extensions to other architectures like Transformers.

Abstract

Paper Structure (14 sections, 33 equations, 2 figures, 3 tables)

This paper contains 14 sections, 33 equations, 2 figures, 3 tables.

Introduction
Related Work
Designing ConvNets from the PDE perspective
Symmetry of the model
Experimental Results
Details of the Architecture
Experiments
Conclusion
Appendix: A Crash Course on PDE
First example: the heat equation
Optimal control and the calculus of variations
The wave equation and hyperbolic systems
General theory of linear PDEs
Nonlinear PDEs

Figures (2)

Figure 1: Schematic of a single block of our ConvNet architecture based on Eq. (3), to replace the bottleneck block of ResNet50. The trapezoidal shapes represent the increase/decrease in the number of channels. The corresponding components of the equation are color-coded.
Figure 2: The time-lapse of a solution to the heat equation, over a square domain. The temperature is represented in the vertical direction and by the coloring. Credits: Oleg Alexandrov via Wikimedia Commons.

A Novel Convolutional Neural Network Architecture with a Continuous Symmetry

TL;DR

Abstract

A Novel Convolutional Neural Network Architecture with a Continuous Symmetry

Authors

TL;DR

Abstract

Table of Contents

Figures (2)