Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

Chenglu Sun; Shuo Shen; Wenzhi Tao; Deyi Xue; Zixia Zhou

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

Chenglu Sun, Shuo Shen, Wenzhi Tao, Deyi Xue, Zixia Zhou

TL;DR

Symbolic regression often fails to recover accurate expressions when data are contaminated with noise. The paper proposes Noise-Resilient Symbolic Regression (NRSR), which combines a Noise-Resilient Gating Module (NGM) for input filtering with a reinforcement-learning-based expression generator and a Mixed Path Entropy (MPE) bonus to promote diverse exploration. The approach uses a PPO-based policy with a reward $R(\tau)=1/(1+NRMSE)$ and a joint entropy objective $L(\theta)=L_p(\theta)+\alpha H_\tau(\pi_\theta)+\beta H(\pi_\theta)$ to drive robust symbol selection. Empirical results on Nguyen benchmarks show that NRSR achieves state-of-the-art performance on high-noise data and strong results on clean data, outperforming multiple baselines in recovery rate $RR$, explored-expression number $EEN$, and NMSE. The work demonstrates the effectiveness and modularity of NGM and MPE for robust SR and suggests future directions like distributed RL to further improve exploration in large search spaces.

Abstract

Symbolic regression (SR) has emerged as a pivotal technique for uncovering the intrinsic information within data and enhancing the interpretability of AI models. However, current state-of-the-art (sota) SR methods struggle to perform correct recovery of symbolic expressions from high-noise data. To address this issue, we introduce a novel noise-resilient SR (NRSR) method capable of recovering expressions from high-noise data. Our method leverages a novel reinforcement learning (RL) approach in conjunction with a designed noise-resilient gating module (NGM) to learn symbolic selection policies. The gating module can dynamically filter the meaningless information from high-noise data, thereby demonstrating a high noise-resilient capability for the SR process. And we also design a mixed path entropy (MPE) bonus term in the RL process to increase the exploration capabilities of the policy. Experimental results demonstrate that our method significantly outperforms several popular baselines on benchmarks with high-noise data. Furthermore, our method also can achieve sota performance on benchmarks with clean data, showcasing its robustness and efficacy in SR tasks.

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

TL;DR

and a joint entropy objective

to drive robust symbol selection. Empirical results on Nguyen benchmarks show that NRSR achieves state-of-the-art performance on high-noise data and strong results on clean data, outperforming multiple baselines in recovery rate

, explored-expression number

, and NMSE. The work demonstrates the effectiveness and modularity of NGM and MPE for robust SR and suggests future directions like distributed RL to further improve exploration in large search spaces.

Abstract

Paper Structure (35 sections, 22 equations, 2 figures, 10 tables, 1 algorithm)

This paper contains 35 sections, 22 equations, 2 figures, 10 tables, 1 algorithm.

Introduction
Related Works
Reinforcement learning for symbolic regression
L0 Regularization
Entropy regularization in reinforcement learning
Method
Noise-Resilient Gating Module
Integration of Gating Layer with Action Mask
Generating Expressions as Training Samples
Reinforcement Learning with Mixed Path Entropy (MPE) Regularization
Experiments
Experimental Configurations
Benchmark
Baselines
Training Process
...and 20 more sections

Figures (2)

Figure 1: Overview of the NRSR training process. (a) Prior to the SR and RL training process, the NGM is trained with a sample network structure. (b) The obtained L0 gates are then combined with the original action mask to select the input variables. (c) During the SR process, the RNN model, serving as the policy, generates output logits which are processed with the new action mask. These processed logits are subsequently converted into action probabilities, which are used to sample symbolic tokens. The policy generates actions in a step-by-step manner (from $t_0$ to $t_n$) following a time sequence structure, thereby forming a trajectory. (d) Each trajectory, consisting of sequential tokens, represents a traversal that can form an expression using the expression tree approach. The resulting expressions are used to calculate the fitness reward. The training process concludes once the optimal expression is found. (e) The samples, comprising states, actions, and rewards, are used to train a new RL policy. After each training iteration, the updated model is used to generate new expressions. The procedures in (c), (d), and (e) constitute an iterative process in NRSR training. This iteration continues until the optimal expression is found or the limit of consumed expressions is reached, thereby ensuring a comprehensive exploration of the solution space.
Figure 2: (a) Variation of RR and EEN metrics performance with respect to $\beta$ in MPE. (b) Dynamics of total entropy during the training phase.

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

TL;DR

Abstract

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)