Table of Contents
Fetching ...

Extrapolation and learning equations

Georg Martius, Christoph H. Lampert

TL;DR

The paper tackles extrapolation in regression for physical systems by introducing the Equation Learner (EQL), a differentiable network that learns analytic expressions using a structured mix of unary and pairwise multiplication units. EQL is trained with a staged sparsity strategy and a model-selection criterion designed to favor simple, extrapolable formulas, enabling interpretable equation discovery. Through experiments on pendulum dynamics, double pendulum kinematics, planar robotic arms, synthetic formula learning, and X-ray transition energies, EQL demonstrates strong extrapolation capability and the ability to recover or approximate underlying physical laws, while highlighting limitations when the true relation lies outside the hypothesized function set. The work provides a pathway toward physics-informed, interpretable models with generalization beyond training domains, and points to future extensions of the base function set to broaden applicability.

Abstract

In classical machine learning, regression is treated as a black box process of identifying a suitable function from a hypothesis set without attempting to gain insight into the mechanism connecting inputs and outputs. In the natural sciences, however, finding an interpretable function for a phenomenon is the prime goal as it allows to understand and generalize results. This paper proposes a novel type of function learning network, called equation learner (EQL), that can learn analytical expressions and is able to extrapolate to unseen domains. It is implemented as an end-to-end differentiable feed-forward network and allows for efficient gradient based training. Due to sparsity regularization concise interpretable expressions can be obtained. Often the true underlying source expression is identified.

Extrapolation and learning equations

TL;DR

The paper tackles extrapolation in regression for physical systems by introducing the Equation Learner (EQL), a differentiable network that learns analytic expressions using a structured mix of unary and pairwise multiplication units. EQL is trained with a staged sparsity strategy and a model-selection criterion designed to favor simple, extrapolable formulas, enabling interpretable equation discovery. Through experiments on pendulum dynamics, double pendulum kinematics, planar robotic arms, synthetic formula learning, and X-ray transition energies, EQL demonstrates strong extrapolation capability and the ability to recover or approximate underlying physical laws, while highlighting limitations when the true relation lies outside the hypothesized function set. The work provides a pathway toward physics-informed, interpretable models with generalization beyond training domains, and points to future extensions of the base function set to broaden applicability.

Abstract

In classical machine learning, regression is treated as a black box process of identifying a suitable function from a hypothesis set without attempting to gain insight into the mechanism connecting inputs and outputs. In the natural sciences, however, finding an interpretable function for a phenomenon is the prime goal as it allows to understand and generalize results. This paper proposes a novel type of function learning network, called equation learner (EQL), that can learn analytical expressions and is able to extrapolate to unseen domains. It is implemented as an end-to-end differentiable feed-forward network and allows for efficient gradient based training. Due to sparsity regularization concise interpretable expressions can be obtained. Often the true underlying source expression is identified.

Paper Structure

This paper contains 20 sections, 13 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Network architecture of the proposed Equation Learner (EQL) for 3 layers ($L=3$) and one neuron per type ($u=4,v=1$).
  • Figure 2: Learning pendulum dynamics. (a) slices of outputs $y_1$ (left) and $y_2$ (right) for inputs $x_1=x_2=x$ for the true system equation (Eq. \ref{['eqn:pend']}) and one of EQL, MLP, SVR instances. The shaded area marks the training region and the vertical bars show the size of the near and far extrapolation domain. (b) one of the learned networks. Numbers on the edges correspond to the entries of $W$ and numbers inside the nodes show the bias values $b$. All weights with $|w| < 0.01$ and orphan nodes are omitted. Learned formulas: $y_1=0.103 x_2$, $y_2=\sin(-x_1)$, which are correct up to symmetry ($1/g=1.01$).
  • Figure 3: Double pendulum kinematics. (a) training trajectory (in y-space). (b) extrapolation test trajectory (in y-space) with output of a learned EQL instance. (c) slices of output $y_4$ for inputs $x_1=x_2=x$ for the true system, one of EQL, MLP, and SVR instances. (d) numeric results, see Tab. \ref{['tab:pend:results']} for details. Note, that predicting 0 would yield a mean error of $0.84$.
  • Figure 4: Formula learning analysis. (a) for F-1, (b) for F-2, and (c) for F-3. (left) $y$ for a single cut through the input space for the true system equation (\ref{['eqn:syn1']},\ref{['eqn:syn2']}), and for an instance of EQL, and MLP. (right) shows the learned networks correspondingly, see Fig. \ref{['fig:pend']} for details. The formula representations where extracted from the networks. For F-3 the algorithm fails with the overcomplete base and typically (9/10 times) ends up in a local minima. With less base function (no cosine) the right formula is found. Both results are presented. See text for a discussion.
  • Figure 5: X-Ray transition energies. (a) Measured data and predicted values by EQL and (b) visualized prediction error for all methods for one train/validation splitting. (c) EQL solutions during model selection in validation error -- sparsity space, see Appendix A1 for details. (d) numeric results. Reported are RMS errors with standard deviation for 10 independent train/validation splits. In real units the error is in 100 keV and is well below the difference between neighboring high-$Z$ elements. (e) learned formulas for different sparsities $s$ (lowest dot for each $s$ in (c)).
  • ...and 3 more figures