On sparse regression, Lp-regularization, and automated model discovery

Jeremy A. McCulloch; Skyler R. St. Pierre; Kevin Linka; Ellen Kuhl

On sparse regression, Lp-regularization, and automated model discovery

Jeremy A. McCulloch, Skyler R. St. Pierre, Kevin Linka, Ellen Kuhl

TL;DR

This work tackles automatic discovery of interpretable, data-driven constitutive relations for nonlinear material behavior using a hybrid approach that combines $L_p$ regularization with physics-informed constitutive neural networks. It introduces two architectures—an invariant-based network and a principal-stretch based network—that enforce thermodynamic consistency and objectivity, and it systematically examines how $L_p$ regularization ($p$ and $\alpha$) influences sparsity, bias, and robustness on both synthetic and brain-tissue data, with normalization proving crucial for stable discovery. The key finding is that $L_2$ regularization is inadequate for discovery, $L_1$ induces sparsity but can bias results, and $L_0$ regularization provides transparent control over the trade-off between interpretability and predictive accuracy, enabling best-in-class term discovery—though nonlinear cases can exhibit multiple local minima, especially in the invariant-based network. The results suggest broad applicability to other discovery methods (e.g., sparse/symbolic regression) and domains, with potential implications for generative material design and automated discovery of materials with user-defined properties.

Abstract

Sparse regression and feature extraction are the cornerstones of knowledge discovery from massive data. Their goal is to discover interpretable and predictive models that provide simple relationships among scientific variables. While the statistical tools for model discovery are well established in the context of linear regression, their generalization to nonlinear regression in material modeling is highly problem-specific and insufficiently understood. Here we explore the potential of neural networks for automatic model discovery and induce sparsity by a hybrid approach that combines two strategies: regularization and physical constraints. We integrate the concept of Lp regularization for subset selection with constitutive neural networks that leverage our domain knowledge in kinematics and thermodynamics. We train our networks with both, synthetic and real data, and perform several thousand discovery runs to infer common guidelines and trends: L2 regularization or ridge regression is unsuitable for model discovery; L1 regularization or lasso promotes sparsity, but induces strong bias; only L0 regularization allows us to transparently fine-tune the trade-off between interpretability and predictability, simplicity and accuracy, and bias and variance. With these insights, we demonstrate that Lp regularized constitutive neural networks can simultaneously discover both, interpretable models and physically meaningful parameters. We anticipate that our findings will generalize to alternative discovery techniques such as sparse and symbolic regression, and to other domains such as biology, chemistry, or medicine. Our ability to automatically discover material models from data could have tremendous applications in generative material design and open new opportunities to manipulate matter, alter properties of existing materials, and discover new materials with user-defined properties.

On sparse regression, Lp-regularization, and automated model discovery

TL;DR

This work tackles automatic discovery of interpretable, data-driven constitutive relations for nonlinear material behavior using a hybrid approach that combines

regularization with physics-informed constitutive neural networks. It introduces two architectures—an invariant-based network and a principal-stretch based network—that enforce thermodynamic consistency and objectivity, and it systematically examines how

regularization (

and

) influences sparsity, bias, and robustness on both synthetic and brain-tissue data, with normalization proving crucial for stable discovery. The key finding is that

regularization is inadequate for discovery,

induces sparsity but can bias results, and

regularization provides transparent control over the trade-off between interpretability and predictive accuracy, enabling best-in-class term discovery—though nonlinear cases can exhibit multiple local minima, especially in the invariant-based network. The results suggest broad applicability to other discovery methods (e.g., sparse/symbolic regression) and domains, with potential implications for generative material design and automated discovery of materials with user-defined properties.

Abstract

Paper Structure (9 sections, 47 equations, 12 figures, 2 tables)

This paper contains 9 sections, 47 equations, 12 figures, 2 tables.

Motivation
L$\!_{\hbox{\normalsize{p}}}$ regularization
Neural networks
Invariant based neural network
Principal stretch based neural network
L$\!_{\hbox{\normalsize{p}}}$ Regularized Neural Networks
Lp regularized invariant based neural network
Lp regularized principal stretch based neural network
Conclusion and recommendations

Figures (12)

Figure 1: Lp regularization. Contours of regularization term, $L_{\rm{p}} = \alpha \, \sum_{i=1}^{n_{\rm{para}}} || \, \hbox{\boldmath $\theta$}{} \, ||_p^p$ with $|| \, \hbox{\boldmath $\theta$}{} \, ||_p^p = |\,w_{i}\,|^p$, for varying powers, $p = [0.25, 0.5, 0.75, 1, 1.5, 2, 4, 8]$, evaluated for two parameters, $w_1$ and $w_2$. For $p \le 1$, top row, with the special case of $L_1$ regularization or lasso represented through the pyramid, $L_{\rm{p}}$ regularization promotes sparsity by setting some weights exactly to zero, but is no longer strictly convex and can have multiple local minima. For $p > 1$, bottom row, with the special case of $L_2$ regularization or ridge regression represented through the ellipsoid, $L_{\rm{p}}$ regularization promotes stability by reducing outliers, while the regularization term remains convex.
Figure 2: Invariant based neural network for automated model discovery. The network takes the deformation gradient $\hbox{\boldmath $F$}{}$ as input and outputs the free energy function $\psi$ from which we calculate the stress $\hbox{\boldmath $P$}{} = \partial \psi / \partial \hbox{\boldmath $F$}{}$. The network is invariant based, it first calculates the invariants $I_1$ and $I_2$, and feeds them into its two hidden layers. The first layer generates the first and second powers $(\circ)$ and $(\circ)^2$ of the invariants and the second layer applies the identity and exponential function $(\circ)$ and $\exp(\circ)$ to these powers. The free energy function $\psi$ is a function of the eight color-coded terms. During training, the network discovers the best model, of $2^8=256$ possible combinations of terms, to explain the experimental data $\hat{\hbox{\boldmath $P$}{}}$.
Figure 3: Loss functions of invariant based neural network. Contours of the loss function $L(\hbox{\boldmath $\theta$}{};\lambda,\gamma)$ for all 28 possible two-term models of the invariant based constitutive neural network in Figure \ref{['fig02']}. The loss function is evaluated across tensile stretches $\lambda = [1.0,...,2.0]$, compressive stretches $\lambda = [1.0,...,0.5]$, and shear strains $\gamma = [0.0,...,0.5]$, with network weights in the ranges $w_i = [0,...,2]$ and $w_j = [0,...,2]$. The minimum of the loss function indicates the exact solution $w_i = 1$ and $w_j = 1$, represented through the white circle. The lower triangle illustrates the non-normalized loss function (\ref{['inv_loss_P11_P12']}), the upper triangle illustrates the normalized loss function (\ref{['inv_loss_Pten_Pcom_Pshr']}). All loss functions are convex, with contours varying from ellipsoids to valleys with long ridges, highlighting the collinearity of some $w_i$ and $w_j$ pairs.
Figure 4: Principal stretch based neural network for automated model discovery. The network takes the deformation gradient $\hbox{\boldmath $F$}{}$ as input and outputs the free energy function $\psi$ from which we calculate the stress $\hbox{\boldmath $P$}{} = \partial \psi / \partial \hbox{\boldmath $F$}{}$. The network is principal stretch based, it first calculates the principal stretches $\lambda_1$ and $\lambda_2$ and $\lambda_3$, and feeds them into its hidden layer. The hidden layer applies eight different powers $(\lambda_1^n+\lambda_2^n+\lambda_3^n-3)$ to these principal stretches. The free energy function $\psi$ is a function of the eight color-coded terms. During training, the network discovers the best model, of $2^8=256$ possible combinations of terms, to explain the experimental data $\hat{\hbox{\boldmath $P$}{}}$.
Figure 5: Loss functions of principal stretch based neural network. Contours of the loss function $L(\hbox{\boldmath $\theta$}{};\lambda,\gamma)$ for all 28 possible two-term models of the principal stretch based constitutive neural network in Figure \ref{['fig04']}. The loss function is evaluated across tensile stretches $\lambda = [1.0,...,2.0]$, compressive stretches $\lambda = [1.0,...,0.5]$, and shear strains $\gamma = [0.0,...,0.5]$, with network weights in the ranges $w_i = [0,...,2]$ and $w_j = [0,...,2]$. The minimum of the loss function indicates the exact solution $w_i = 1$ and $w_j = 1$, represented through the white circle. The lower triangle illustrates the non-normalized loss function (\ref{['lam_loss_P11_P12']}), the upper triangle illustrates the normalized loss function (\ref{['lam_loss_Pten_Pcom_Pshr']}). All loss functions are convex, with contours varying from a few ellipsoids to many valleys with long ridges, highlighting the collinearity of many $w_i$ and $w_j$ pairs.
...and 7 more figures

On sparse regression, Lp-regularization, and automated model discovery

TL;DR

Abstract

On sparse regression, Lp-regularization, and automated model discovery

Authors

TL;DR

Abstract

Table of Contents

Figures (12)