SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression

Ho Fung Tsoi; Vladimir Loncar; Sridhara Dasu; Philip Harris

SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression

Ho Fung Tsoi, Vladimir Loncar, Sridhara Dasu, Philip Harris

TL;DR

SymbolNet addresses the scalability gap in symbolic regression by leveraging neural symbolic regression with an adaptive dynamic pruning framework. It jointly optimizes regression accuracy and explicit sparsity across weights, inputs, and operators, converging to user-defined sparsity targets via self-adaptive regularization. The approach yields compact, interpretable symbolic expressions on high-dimensional data—demonstrated on LHC jet tagging ($n=16$), MNIST ($n=784$), and SVHN ($n=3072$)—and delivers favorable FPGA resource usage and latency compared to heavily compressed neural baselines. These results suggest a practical path to deploying SR on resource-constrained hardware, with potential extensions to non-differentiable operators and quantization-aware training for even tighter hardware integration.

Abstract

Compact symbolic expressions have been shown to be more efficient than neural network models in terms of resource consumption and inference speed when implemented on custom hardware such as FPGAs, while maintaining comparable accuracy~\cite{tsoi2023symbolic}. These capabilities are highly valuable in environments with stringent computational resource constraints, such as high-energy physics experiments at the CERN Large Hadron Collider. However, finding compact expressions for high-dimensional datasets remains challenging due to the inherent limitations of genetic programming, the search algorithm of most symbolic regression methods. Contrary to genetic programming, the neural network approach to symbolic regression offers scalability to high-dimensional inputs and leverages gradient methods for faster equation searching. Common ways of constraining expression complexity often involve multistage pruning with fine-tuning, which can result in significant performance loss. In this work, we propose $\tt{SymbolNet}$, a neural network approach to symbolic regression specifically designed as a model compression technique, aimed at enabling low-latency inference for high-dimensional inputs on custom hardware such as FPGAs. This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term for each pruning type, which can adaptively adjust its strength, leading to convergence at a target sparsity ratio. Unlike most existing symbolic regression methods that struggle with datasets containing more than $\mathcal{O}(10)$ inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs).

SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression

TL;DR

), MNIST (

), and SVHN (

)—and delivers favorable FPGA resource usage and latency compared to heavily compressed neural baselines. These results suggest a practical path to deploying SR on resource-constrained hardware, with potential extensions to non-differentiable operators and quantization-aware training for even tighter hardware integration.

Abstract

, a neural network approach to symbolic regression specifically designed as a model compression technique, aimed at enabling low-latency inference for high-dimensional inputs on custom hardware such as FPGAs. This framework allows dynamic pruning of model weights, input features, and mathematical operators in a single training process, where both training loss and expression complexity are optimized simultaneously. We introduce a sparsity regularization term for each pruning type, which can adaptively adjust its strength, leading to convergence at a target sparsity ratio. Unlike most existing symbolic regression methods that struggle with datasets containing more than

inputs, we demonstrate the effectiveness of our model on the LHC jet tagging task (16 inputs), MNIST (784 inputs), and SVHN (3072 inputs).

Paper Structure (22 sections, 4 equations, 9 figures, 4 tables)

This paper contains 22 sections, 4 equations, 9 figures, 4 tables.

Introduction
Related work
SymbolNet architecture
Neural symbolic regression
Dynamic pruning per network component type
Pruning of model weights
Pruning of input features
Pruning of mathematical operators
Self-adaptive regularization for sparsity
Training framework
Experimental setup
Expression complexity
Baseline for comparison
Datasets and experiments
LHC jet tagging
...and 7 more sections

Figures (9)

Figure 1: A symbolic layer composed of three linear transformation nodes $z$, activated by a unary operator $f$ and a binary operator $g$.
Figure 2: An example NN with four input features ($x$), two symbolic layers (large rectangles), and one output node ($y$). Each symbolic layer contains five linear transformations (empty circles), followed by three unary operations (small rectangles) and one binary operation (ovals). The solid lines represent nonzero model weights ($w$) for the linear transformation, while the dashed lines indicate activation by mathematical operations. The intermediate expression outputs are shown as green text. The final expression from this example model, after simplifying the constants from $w$ to $c$, is shown in blue text: $\bm{y=c_1\tanh(c_2x_2^2)+c_3x_2x_4\sin(c_4x_3)}$. This illustrates the basic architecture of $\tt{SymbolNet}$ before incorporating additional components for adaptive dynamic pruning.
Figure 3: Schematic sketch of the SR-dedicated dynamic pruning mechanism within the $\tt{SymbolNet}$ architecture: (a) model weights, (b) input features, (c) unary operators, and (d) binary operators. Solid arrows represent the forward pass, while dotted arrows represent the backward pass, linking the trainable parameters. The $\tt{SymbolNet}$ architecture is constructed by integrating these elements with the basic network architecture illustrated, for example, in Fig. \ref{['fig:arch1']}.
Figure 4: The decay factor, $D(s;\alpha,d)$, is employed to reduce the rate of increase in high-thredhold values as the sparsity ratio ($s$) approaches its target value ($\alpha$). High-threshold driving is paused when $s\geq\alpha$. The profiles of $D(0\leq s\leq 1)$ for a target sparsity ratio of $\alpha=0.8$ at three different decay rates ($d$) are shown.
Figure 5: Counting the complexity of an expression in its tree representation involves tree traversal. Using the example expression $\bm{y=c_1\tanh(c_2x_2^2)+c_3x_2x_4\sin(c_4x_3)}$ from Fig. \ref{['fig:arch1']}, the sub-expressions at each step of the traversal ($k$) are listed. The expression complexity of this example is 17, assuming all mathematical operators, input features, and constants are equally weighted. Note that the number of possible traversal steps corresponds to the number of nodes in the tree.
...and 4 more figures

SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression

TL;DR

Abstract

SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning for Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (9)