Symbolic identification of tensor equations in multidimensional physical fields
Tianyi Chen, Hao Yang, Wenjun Ma, Jun Zhang
TL;DR
The paper addresses tensor equation discovery from data by introducing SITE, a tensor-symbolic regression framework with a host–plasmid encoding inspired by M-GEP. It enforces dimensional homogeneity and employs tensor linear regression to efficiently identify both structure and coefficients, enabling robust recovery of tensor relations from synthetic and molecular data. Validation on Maxwell and Reynolds-stress benchmarks plus DSMC-derived constitutive data demonstrates accurate equation discovery under noise, limited data, and different flow regimes, with clear potential for data-driven constitutive modeling in high-dimensional physical systems. SITE thus offers a scalable, interpretable pathway for tensor equation discovery across fluids, electromagnetism, and beyond.
Abstract
Recently, data-driven methods have shown great promise for discovering governing equations from simulation or experimental data. However, most existing approaches are limited to scalar equations, with few capable of identifying tensor relationships. In this work, we propose a general data-driven framework for identifying tensor equations, referred to as Symbolic Identification of Tensor Equations (SITE). The core idea of SITE--representing tensor equations using a host-plasmid structure--is inspired by the multidimensional gene expression programming (M-GEP) approach. To improve the robustness of the evolutionary process, SITE adopts a genetic information retention strategy. Moreover, SITE introduces two key innovations beyond conventional evolutionary algorithms. First, it incorporates a dimensional homogeneity check to restrict the search space and eliminate physically invalid expressions. Second, it replaces traditional linear scaling with a tensor linear regression technique, greatly enhancing the efficiency of numerical coefficient optimization. We validate SITE using two benchmark scenarios, where it accurately recovers target equations from synthetic data, showing robustness to noise and small sample sizes. Furthermore, SITE is applied to identify constitutive relations directly from molecular simulation data, which are generated without reliance on macroscopic constitutive models. It adapts to both compressible and incompressible flow conditions and successfully identifies the corresponding macroscopic forms, highlighting its potential for data-driven discovery of tensor equation.
