Table of Contents
Fetching ...

Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation

Tianyi Chen, Zhi-Qin John Xu

TL;DR

The paper addresses the challenge of deploying neural networks in scientific contexts where moderate network size and fast inference are essential. It introduces condensation reduction, a structured method that merges similarly oriented neurons based on cosine similarity to form smaller subnetworks, preserving the original network's functionality. The approach is shown to be universal across fully connected and convolutional architectures, with manual and automatic reduction strategies and embedding-principle justification; it successfully reduces a FC network for combustion simulation from 6.80M to 2.85M parameters (≈41.9% of the original) and a MobileNetV2-based CNN for CIFAR-10 from 2.24M to 0.26M parameters (≈11.5%), while maintaining comparable accuracy after short retraining. This method offers significant practical impact for rapid inference in resource-constrained environments and for accelerating scientific computations by delivering compact, high-fidelity subnetworks.

Abstract

Neural networks have been extensively applied to a variety of tasks, achieving astounding results. Applying neural networks in the scientific field is an important research direction that is gaining increasing attention. In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference during application. Additionally, comparing neural networks to traditional algorithms in scientific applications is inevitable. These applications often require rapid computations, making the reduction of neural network sizes increasingly important. Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity. Theoretical work has discovered that under strong non-linearity, neurons in the same layer tend to behave similarly, a phenomenon known as condensation. Condensation offers an opportunity to reduce the scale of neural networks to a smaller subnetwork with similar performance. In this article, we propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems. Our reduction method can currently be applied to both fully connected networks and convolutional networks, achieving positive results. In complex combustion acceleration tasks, we reduced the size of the neural network to 41.7% of its original scale while maintaining prediction accuracy. In the CIFAR10 image classification task, we reduced the network size to 11.5% of the original scale, still maintaining a satisfactory validation accuracy. Our method can be applied to most trained neural networks, reducing computational pressure and improving inference speed.

Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation

TL;DR

The paper addresses the challenge of deploying neural networks in scientific contexts where moderate network size and fast inference are essential. It introduces condensation reduction, a structured method that merges similarly oriented neurons based on cosine similarity to form smaller subnetworks, preserving the original network's functionality. The approach is shown to be universal across fully connected and convolutional architectures, with manual and automatic reduction strategies and embedding-principle justification; it successfully reduces a FC network for combustion simulation from 6.80M to 2.85M parameters (≈41.9% of the original) and a MobileNetV2-based CNN for CIFAR-10 from 2.24M to 0.26M parameters (≈11.5%), while maintaining comparable accuracy after short retraining. This method offers significant practical impact for rapid inference in resource-constrained environments and for accelerating scientific computations by delivering compact, high-fidelity subnetworks.

Abstract

Neural networks have been extensively applied to a variety of tasks, achieving astounding results. Applying neural networks in the scientific field is an important research direction that is gaining increasing attention. In scientific applications, the scale of neural networks is generally moderate-size, mainly to ensure the speed of inference during application. Additionally, comparing neural networks to traditional algorithms in scientific applications is inevitable. These applications often require rapid computations, making the reduction of neural network sizes increasingly important. Existing work has found that the powerful capabilities of neural networks are primarily due to their non-linearity. Theoretical work has discovered that under strong non-linearity, neurons in the same layer tend to behave similarly, a phenomenon known as condensation. Condensation offers an opportunity to reduce the scale of neural networks to a smaller subnetwork with similar performance. In this article, we propose a condensation reduction algorithm to verify the feasibility of this idea in practical problems. Our reduction method can currently be applied to both fully connected networks and convolutional networks, achieving positive results. In complex combustion acceleration tasks, we reduced the size of the neural network to 41.7% of its original scale while maintaining prediction accuracy. In the CIFAR10 image classification task, we reduced the network size to 11.5% of the original scale, still maintaining a satisfactory validation accuracy. Our method can be applied to most trained neural networks, reducing computational pressure and improving inference speed.
Paper Structure (28 sections, 64 equations, 7 figures, 3 tables)

This paper contains 28 sections, 64 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Flow chart of automatic condensation reduction.The left side shows the main reduction process of automatic condensation reduction, while the right side details the layer-by-layer reduction process carried out in each main reduction.
  • Figure 2: Training loss and validation loss of the FNN models. The subfigures (a) to (d) illustrate the training and validation losses of the original fully connected neural network model through to its third reduction.
  • Figure 3: Cosine similarity matrixs of 2nd layer on different stages. In the matrix heatmap, the element in the $i$-th row and $j$-th column represents the cosine similarity between the $i$-th neuron and the $j$-th neuron. The more distinct the blocks in the cosine matrix, the stronger the condensation of the neural network. From left to right, the images correspond to the original fully connected neural network model through to the model after the third reduction. "Epoch 0" indicates that the model has just been reduced and has not yet been trained. "Epoch 5000" represents the fully connected neural network model upon completion of training.
  • Figure 4: Turbulent ignition. The image displays the temperature distributions obtained in combination with EBI for drm19, showing results from left to right for CVODE, the original fully connected neural network, and the neural network after the third reduction.
  • Figure 5: Convolutional block of MobileNetV2. In the figure, 'Conv $1\times1$' refers to layers where the convolutional kernels are $1\times1$, with the number of channels and the number of kernels varying in each layer. 'Conv $3\times3$' indicates layers where the convolutional kernels are $3\times3$, with one channel, and the number of kernels may vary per layer. 'BN' stands for batch normalization, while 'ReLU6' and 'Linear' refer to activation functions.
  • ...and 2 more figures