Table of Contents
Fetching ...

Interpretability of linear regression models of glassy dynamics

Anand Sharma, Chen Liu, Misaki Ozawa, Daniele Coslovich

Abstract

Data-driven models can accurately describe and predict the dynamical properties of glass-forming liquids from structural data. Accurate predictions, however, do not guarantee an understanding of the underlying physical phenomena and the key factors that control them. In this paper, we illustrate the merits and limitations of linear regression models of glassy dynamics built on high-dimensional structural descriptors. By analyzing data for a two-dimensional glass model, we show that several descriptors commonly used in glass-transition studies display multicollinearity, which hinders the interpretability of linear models. Ridge regression suppresses some of the shortcomings of multicollinearity, but its solutions are not concise enough to be physically interpretable. Only by using dimensional reduction techniques we do eventually obtain linear models that strike a balance between prediction accuracy and interpretability. Our analysis points to a key role of local packing and composition fluctuations in the glass model under study.

Interpretability of linear regression models of glassy dynamics

Abstract

Data-driven models can accurately describe and predict the dynamical properties of glass-forming liquids from structural data. Accurate predictions, however, do not guarantee an understanding of the underlying physical phenomena and the key factors that control them. In this paper, we illustrate the merits and limitations of linear regression models of glassy dynamics built on high-dimensional structural descriptors. By analyzing data for a two-dimensional glass model, we show that several descriptors commonly used in glass-transition studies display multicollinearity, which hinders the interpretability of linear models. Ridge regression suppresses some of the shortcomings of multicollinearity, but its solutions are not concise enough to be physically interpretable. Only by using dimensional reduction techniques we do eventually obtain linear models that strike a balance between prediction accuracy and interpretability. Our analysis points to a key role of local packing and composition fluctuations in the glass model under study.

Paper Structure

This paper contains 45 sections, 75 equations, 21 figures, 3 tables.

Figures (21)

  • Figure 1: Pearson coefficient, $R[{X^{(f)}}, {Y}]$, between the dynamic propensity $\textbf{Y}$ and each structural feature $\textbf{X}^{(f)}$ of the BP descriptor for $f=1, \dots, M$.
  • Figure 2: Correlation matrix $\mathrm{C}$ for the BP descriptor. The matrix elements are given by the Pearson correlation coefficient $R[{X^{(f)}}, {X^{(f')}}]$.
  • Figure 3: Weights obtained from OLS and Ridge regression of the dynamic propensity using the BP descriptor: (a) $\hat{\bf w}_\textrm{OLS}$ and (b) $\hat{\bf w}_\textrm{ridge}$ for $\alpha=10^{-1}$. The error bar corresponds to the standard deviation estimated over independent random training sets.
  • Figure 4: Normalized weights as a function of the Ridge regularization parameter $\alpha$ corresponding to (a) the radial features $G^{S}(k)$ for $0\le k \le 47$ and (b) the angular features $\Psi^{SS}(k)$ for $0\le k \le 21$. The width of the shaded areas corresponds to the standard deviation estimated over independent random training sets. The color code indicates the feature index $k$.
  • Figure 5: Prediction performance metrics for Ridge regression of the dynamic propensity using the BP descriptor: (a) Pearson coefficient $R[Y, \hat{Y}]$ as a function of $\alpha$ and (b) coefficient of determination $R_2[Y, \hat{Y}]$ as a function of $\alpha$. Full and dashed lines correspond to results obtained using the test set and the train set, respectively.
  • ...and 16 more figures