Table of Contents
Fetching ...

Exploring explicit coarse-grained structure in artificial neural networks

Xi-Ci Yang, Z. Y. Xie, Xiao-Tao Yang

TL;DR

This work tackles the interpretability of deep neural networks by introducing an explicit, renormalization-group–inspired hierarchical coarse-grained structure into both network design and data processing. It presents TaylorNet, a Taylor-series–based network that uses locally coarse-grained nonlinear operations to approximate mappings without nonlinear activations, and a multilevel data-distillation pipeline that yields progressively abstracted reference images. Empirical results show TaylorNet achieving near state-of-the-art performance on MNIST with substantially fewer parameters and competitive results on CIFAR-10, while the distillation framework enables classification by embedding similarity using distilled references. The work provides a principled link between RG methods and deep learning, offering interpretable, parameter-efficient architectures and a scalable data-distillation paradigm with potential extensions to multiple networks and scale-invariance analysis.

Abstract

We propose to employ the hierarchical coarse-grained structure in the artificial neural networks explicitly to improve the interpretability without degrading performance. The idea has been applied in two situations. One is a neural network called TaylorNet, which aims to approximate the general mapping from input data to output result in terms of Taylor series directly, without resorting to any magic nonlinear activations. The other is a new setup for data distillation, which can perform multi-level abstraction of the input dataset and generate new data that possesses the relevant features of the original dataset and can be used as references for classification. In both cases, the coarse-grained structure plays an important role in simplifying the network and improving both the interpretability and efficiency. The validity has been demonstrated on MNIST and CIFAR-10 datasets. Further improvement and some open questions related are also discussed.

Exploring explicit coarse-grained structure in artificial neural networks

TL;DR

This work tackles the interpretability of deep neural networks by introducing an explicit, renormalization-group–inspired hierarchical coarse-grained structure into both network design and data processing. It presents TaylorNet, a Taylor-series–based network that uses locally coarse-grained nonlinear operations to approximate mappings without nonlinear activations, and a multilevel data-distillation pipeline that yields progressively abstracted reference images. Empirical results show TaylorNet achieving near state-of-the-art performance on MNIST with substantially fewer parameters and competitive results on CIFAR-10, while the distillation framework enables classification by embedding similarity using distilled references. The work provides a principled link between RG methods and deep learning, offering interpretable, parameter-efficient architectures and a scalable data-distillation paradigm with potential extensions to multiple networks and scale-invariance analysis.

Abstract

We propose to employ the hierarchical coarse-grained structure in the artificial neural networks explicitly to improve the interpretability without degrading performance. The idea has been applied in two situations. One is a neural network called TaylorNet, which aims to approximate the general mapping from input data to output result in terms of Taylor series directly, without resorting to any magic nonlinear activations. The other is a new setup for data distillation, which can perform multi-level abstraction of the input dataset and generate new data that possesses the relevant features of the original dataset and can be used as references for classification. In both cases, the coarse-grained structure plays an important role in simplifying the network and improving both the interpretability and efficiency. The validity has been demonstrated on MNIST and CIFAR-10 datasets. Further improvement and some open questions related are also discussed.
Paper Structure (9 sections, 7 equations, 13 figures)

This paper contains 9 sections, 7 equations, 13 figures.

Figures (13)

  • Figure 1: Hierarchical coarse-grained structure generated in the RG process for a matrix product operator with open boundary condition. The hollow circles connected by a dashed line and colored by yellow denote the lattice sites where the operator is defined. As described in the main text, the local RG transformations are represented by the rank-3 tensors $U$s denoted by solid dots, and the DOFs reside on the links between the dots and are denoted as $\{\sigma\}$. For the sake of clarity, the various scales are distinguished by different colors.
  • Figure 2: Test accuracy of the experiment on direct Taylor expansion, as expressed in Eq. (\ref{['eq:NaTay']}). The data is obtained on the MNIST dataset with resized $7\times7$ images.
  • Figure 3: Distribution of the obtained weight corresponding to the quadratic terms $x_ix_j$, as a function of distance $d_{ij}$ between the two pixels in images of the 7$\times$7 MNIST dataset. Weights are ordered by value in ascending order, and equally divided into five groups that are referred to as level-1 (L1) to level-5 (L5), respectively. (a) Weight distribution at each distance $d_{ij}$. (b) Weight ratio distribution for each level as a function of $d_{ij}$. The details of the distribution can be found in Fig. \ref{['fig:DistribDetail']} in App. \ref{['app:Weight']}.
  • Figure 4: Illustration of the CG operation in TaylorNet. (a) Convolution operation. (b) CG operation. Clearly, the product of $x_1$ and $x_{16}$ will emerge in the next CG after the next CG operation. (c) Two successive CG operations on four variables divided into two local clusters. Very complicated terms of $x$ emerge in the expression of $z$, as shown in Fig. \ref{['fig:W']} in App. \ref{['app:Yexp']}.
  • Figure 5: Sketch of the TaylorNet used in the classification task on MNIST dataset. Hereafter, the numbers in the box denote the representation form of the data, e.g., $28\times 28\times 64$ denotes 64 feature maps with size $28\times 28$, and the operations sit on the arrows correspond to different neural network layers, e.g., Conv($l_1$,$l_2$,c,$s_1$,$s_2$,p) means convolutional layer with kernel size $l_1\times l_2$, channels $c$, stride size $s_1\times s_2$, and padding number $p$ (default 0), similar for CG operation and dilated CG operation as discussed in the main text and Fig. \ref{['fig:DiCG']}. Here, all the four convolutional layers have structure Conv(3,3,64,1,1,1), both CG layers have structure CG(2,2,64,2,2), diCG1 and diCG2 have structures CG(2,2,64,1,1,7) and CG(2,2,64,1,1,3), respectively. The action of a multi-channel convolution operation is illustrated in Fig. \ref{['fig:ConvOP']} in App. \ref{['app:Conv']}, and more details can also be found in Ref. GF-Book2016.
  • ...and 8 more figures