ViSymRe: Vision Multimodal Symbolic Regression
Da Li, Junping Yin, Jin Xu, Xinxin Li, Juan Zhang
TL;DR
ViSymRe presents a Transformer-based Vision Symbolic Regression framework that uses Multi-View Random Slicing to visualize high-dimensional equations in 2D and fuse them with dataset information through a dual-vision pipeline. A Visual Decoder enables dataset-only inference by predicting discrete visual features via a codebook, while a Biased Cross-Attention module suppresses noise from the virtual vision during fusion. The approach is optimized with a multi-objective loss and a syntax-constrained autoregressive decoder to improve interpretability and robustness, achieving strong performance on low-dimensional and complex SR benchmarks with efficient inference. AMS and robust ablations support good generalization to varying scales and noise conditions, suggesting ViSymRe's potential for rapid scientific discovery. Overall, ViSymRe demonstrates competitive SR accuracy and considerably lower complexity in practical, low-complexity scenarios while providing a scalable path toward dataset-only deployment in multimodal SR settings.
Abstract
Extracting interpretable equations from observational datasets to describe complex natural phenomena is one of the core goals of artificial intelligence. This field is known as symbolic regression (SR). In recent years, Transformer-based paradigms have become a new trend in SR, addressing the well-known problem of inefficient search. However, the modal heterogeneity between datasets and equations often hinders the convergence and generalization of these models. In this paper, we propose ViSymRe, a Vision Symbolic Regression framework, to explore the positive role of visual modality in enhancing the performance of Transformer-based SR paradigms. To overcome the challenge where the visual SR model is untrainable in high-dimensional scenarios, we present Multi-View Random Slicing (MVRS). By projecting multivariate equations into 2-D space using random affine transformations, MVRS avoids common defects in high-dimensional visualization, such as variable degradation, non-linear interaction missing, and exponentially increasing sampling complexity, enabling ViSymRe to be trained with low computational costs. To support dataset-only deployment of ViSymRe, we design a dual-vision pipeline architecture based on generative techniques, which reconstructs visual features directly from the datasets via an auxiliary Visual Decoder and automatically suppresses the attention weights of reconstruction noise through a proposed Biased Cross-Attention feature fusion module, ensuring that subsequent processes are not affected by noisy modalities. Ablation studies demonstrate the positive contribution of visual modality to improving model convergence level and enhancing various SR metrics. Furthermore, evaluation results on mainstream benchmarks indicate that ViSymRe achieves competitive performance compared to baselines, particularly in low-complexity and rapid-inference scenarios.
