Table of Contents
Fetching ...

Symbolic identification of tensor equations in multidimensional physical fields

Tianyi Chen, Hao Yang, Wenjun Ma, Jun Zhang

TL;DR

The paper addresses tensor equation discovery from data by introducing SITE, a tensor-symbolic regression framework with a host–plasmid encoding inspired by M-GEP. It enforces dimensional homogeneity and employs tensor linear regression to efficiently identify both structure and coefficients, enabling robust recovery of tensor relations from synthetic and molecular data. Validation on Maxwell and Reynolds-stress benchmarks plus DSMC-derived constitutive data demonstrates accurate equation discovery under noise, limited data, and different flow regimes, with clear potential for data-driven constitutive modeling in high-dimensional physical systems. SITE thus offers a scalable, interpretable pathway for tensor equation discovery across fluids, electromagnetism, and beyond.

Abstract

Recently, data-driven methods have shown great promise for discovering governing equations from simulation or experimental data. However, most existing approaches are limited to scalar equations, with few capable of identifying tensor relationships. In this work, we propose a general data-driven framework for identifying tensor equations, referred to as Symbolic Identification of Tensor Equations (SITE). The core idea of SITE--representing tensor equations using a host-plasmid structure--is inspired by the multidimensional gene expression programming (M-GEP) approach. To improve the robustness of the evolutionary process, SITE adopts a genetic information retention strategy. Moreover, SITE introduces two key innovations beyond conventional evolutionary algorithms. First, it incorporates a dimensional homogeneity check to restrict the search space and eliminate physically invalid expressions. Second, it replaces traditional linear scaling with a tensor linear regression technique, greatly enhancing the efficiency of numerical coefficient optimization. We validate SITE using two benchmark scenarios, where it accurately recovers target equations from synthetic data, showing robustness to noise and small sample sizes. Furthermore, SITE is applied to identify constitutive relations directly from molecular simulation data, which are generated without reliance on macroscopic constitutive models. It adapts to both compressible and incompressible flow conditions and successfully identifies the corresponding macroscopic forms, highlighting its potential for data-driven discovery of tensor equation.

Symbolic identification of tensor equations in multidimensional physical fields

TL;DR

The paper addresses tensor equation discovery from data by introducing SITE, a tensor-symbolic regression framework with a host–plasmid encoding inspired by M-GEP. It enforces dimensional homogeneity and employs tensor linear regression to efficiently identify both structure and coefficients, enabling robust recovery of tensor relations from synthetic and molecular data. Validation on Maxwell and Reynolds-stress benchmarks plus DSMC-derived constitutive data demonstrates accurate equation discovery under noise, limited data, and different flow regimes, with clear potential for data-driven constitutive modeling in high-dimensional physical systems. SITE thus offers a scalable, interpretable pathway for tensor equation discovery across fluids, electromagnetism, and beyond.

Abstract

Recently, data-driven methods have shown great promise for discovering governing equations from simulation or experimental data. However, most existing approaches are limited to scalar equations, with few capable of identifying tensor relationships. In this work, we propose a general data-driven framework for identifying tensor equations, referred to as Symbolic Identification of Tensor Equations (SITE). The core idea of SITE--representing tensor equations using a host-plasmid structure--is inspired by the multidimensional gene expression programming (M-GEP) approach. To improve the robustness of the evolutionary process, SITE adopts a genetic information retention strategy. Moreover, SITE introduces two key innovations beyond conventional evolutionary algorithms. First, it incorporates a dimensional homogeneity check to restrict the search space and eliminate physically invalid expressions. Second, it replaces traditional linear scaling with a tensor linear regression technique, greatly enhancing the efficiency of numerical coefficient optimization. We validate SITE using two benchmark scenarios, where it accurately recovers target equations from synthetic data, showing robustness to noise and small sample sizes. Furthermore, SITE is applied to identify constitutive relations directly from molecular simulation data, which are generated without reliance on macroscopic constitutive models. It adapts to both compressible and incompressible flow conditions and successfully identifies the corresponding macroscopic forms, highlighting its potential for data-driven discovery of tensor equation.

Paper Structure

This paper contains 18 sections, 22 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Overview of the SITE framework. (a) Data preprocessing, including the calculation of the gradients of macroscopic quantities and construction of the terminal library. (b) Schematic diagram of tensor (host individual) and scalar (plasmid individual). (c) Flowchart of the evolutionary workflow.
  • Figure 2: Schematic diagram of generating plasmid populations based on host population.
  • Figure 3: Schematic diagram of dimension vectors and dimensional homogeneity check.
  • Figure 4: (a) Schematic diagram of the electric field, the magnetic field and sampled points. For clarity of illustration, only the vectors along the guiding dashed line are shown. (b) Trend lines of the electromagnetic field magnitude at the points along the guiding line in (a) before and after adding $5\%$ Gaussian noise. The solid line represents the ideal setting, and the dashed line represents the real situation.
  • Figure 5: Distribution of velocity magnitude in the compressible cavity flow case and the spatial locations of sampled data points used for SITE.
  • ...and 1 more figures