Table of Contents
Fetching ...

Higher-Rank Irreducible Cartesian Tensors for Equivariant Message Passing

Viktor Zaverkin, Francesco Alesiani, Takashi Maruyama, Federico Errica, Henrik Christiansen, Makoto Takamoto, Nicolas Weber, Mathias Niepert

TL;DR

Higher-rank irreducible Cartesian tensor products are integrated into message-passing neural networks and proved to have on-par or better performance than that of state-of-the-art spherical and Cartesian models.

Abstract

The ability to perform fast and accurate atomistic simulations is crucial for advancing the chemical sciences. By learning from high-quality data, machine-learned interatomic potentials achieve accuracy on par with ab initio and first-principles methods at a fraction of their computational cost. The success of machine-learned interatomic potentials arises from integrating inductive biases such as equivariance to group actions on an atomic system, e.g., equivariance to rotations and reflections. In particular, the field has notably advanced with the emergence of equivariant message passing. Most of these models represent an atomic system using spherical tensors, tensor products of which require complicated numerical coefficients and can be computationally demanding. Cartesian tensors offer a promising alternative, though state-of-the-art methods lack flexibility in message-passing mechanisms, restricting their architectures and expressive power. This work explores higher-rank irreducible Cartesian tensors to address these limitations. We integrate irreducible Cartesian tensor products into message-passing neural networks and prove the equivariance and traceless property of the resulting layers. Through empirical evaluations on various benchmark data sets, we consistently observe on-par or better performance than that of state-of-the-art spherical and Cartesian models.

Higher-Rank Irreducible Cartesian Tensors for Equivariant Message Passing

TL;DR

Higher-rank irreducible Cartesian tensor products are integrated into message-passing neural networks and proved to have on-par or better performance than that of state-of-the-art spherical and Cartesian models.

Abstract

The ability to perform fast and accurate atomistic simulations is crucial for advancing the chemical sciences. By learning from high-quality data, machine-learned interatomic potentials achieve accuracy on par with ab initio and first-principles methods at a fraction of their computational cost. The success of machine-learned interatomic potentials arises from integrating inductive biases such as equivariance to group actions on an atomic system, e.g., equivariance to rotations and reflections. In particular, the field has notably advanced with the emergence of equivariant message passing. Most of these models represent an atomic system using spherical tensors, tensor products of which require complicated numerical coefficients and can be computationally demanding. Cartesian tensors offer a promising alternative, though state-of-the-art methods lack flexibility in message-passing mechanisms, restricting their architectures and expressive power. This work explores higher-rank irreducible Cartesian tensors to address these limitations. We integrate irreducible Cartesian tensor products into message-passing neural networks and prove the equivariance and traceless property of the resulting layers. Through empirical evaluations on various benchmark data sets, we consistently observe on-par or better performance than that of state-of-the-art spherical and Cartesian models.
Paper Structure (20 sections, 9 theorems, 60 equations, 6 figures, 9 tables)

This paper contains 20 sections, 9 theorems, 60 equations, 6 figures, 9 tables.

Key Result

Proposition 4.1

The message-passing layers based on irreducible Cartesian tensors and their irreducible tensor products are equivariant to actions of the orthogonal group.

Figures (6)

  • Figure 1: Schematic illustration of (a) the construction of an irreducible Cartesian tensor for a local atomic environment and (b) the tensor product of two irreducible Cartesian tensors of rank $l_1$ and $l_2$. The construction of an irreducible Cartesian tensor from a unit vector $\hat{\mathbf{r}}$ is defined in Eq. (\ref{['eq:cartesian_irreps']}). In this work, we use tensors with the same rank $n$ and weight $l$, i.e., $n=l$, avoiding the need for embedding tensors with $l < n$ in a higher-dimensional tensor space. Therefore, we use $l$ to identify the rank and the weight of an irreducible Cartesian tensor. The tensor product is defined in Eqs. (\ref{['eq:product_even']}) and (\ref{['eq:product_odd']}), resulting in a new tensor $\mathbf{T}_{l_3} = (\mathbf{T}_{l_1} \otimes_{\mathrm{Cart}} \mathbf{T}_{l_2})_{l_3}$ of rank $l_3 = \{\lvert l_1 - l_2\rvert, \cdots, l_1 + l_2\}$. Transparent boxes denote the linearly dependent elements of symmetric and traceless tensors. The tensor product can be even or odd, defined by $l_1+l_2-l_3$.
  • Figure 2: Inference times and memory consumption as a function of the tensor rank $L$ (a)--(b) and the correlation order $\nu$ (c)--(d). All results are obtained for the 3BPA data set and $l_\mathrm{max} = L$. We used eight feature channels to allow experiments with larger $\nu$ values. MACE models use intermediate tensors with $l > l_\mathrm{max}$ for their product basis, which we fixed to $l = l_\mathrm{max}$. Otherwise, pre-computing generalized Clebsch–Gordan coefficients for $\nu > 4$ would require more than 2 TB of RAM. For ICTP, we used the full product basis to compute the same number of $\nu$-fold tensor products as in MACE.
  • Figure 3: Potential energy profiles for three cuts through the 3BPA molecule's potential energy surface. All models are trained using 50 configurations, and additional 50 are used for early stopping. The 3BPA molecule, including the three dihedral angles ($\alpha$, $\beta$, and $\gamma$), provided in degrees $^\circ$, is shown as an inset. The color code of the inset molecule is C grey, O red, N blue, and H white. The reference potential energy profile (DFT) is shown in black. Each profile is shifted such that each model's lowest energy is zero. Shaded areas denote standard deviations across five independent runs.
  • Figure A1: Potential energy profiles for three cuts through the 3BPA molecule's potential energy surface (results for $N_\mathrm{train}=450$). All models are trained using 450 configurations, and the remaining 50 are used for early stopping. The 3BPA molecule, including the three dihedral angles ($\alpha$, $\beta$, and $\gamma$), provided in degrees $^\circ$, is shown as an inset. The color code of the inset molecule is C grey, O red, N blue, and H white. The reference potential energy profile (DFT) is shown in black. Each profile is shifted such that each model's lowest energy is zero. Shaded areas denote standard deviations across five independent runs.
  • Figure A2: Potential energy profiles of (a) the dihedral angle describing the rotation around the C-C bond and (b) hydrogen transfer between two oxygen atoms (results for $N_\mathrm{train}=450$). All models are trained using 450 molecules, and the remaining 50 are used for early stopping. The acetylacetone molecule, including the dihedral angle in degrees $^\circ$ describing the rotation around the C-C bond ($\alpha$), is shown as an inset in (a). The color code of the inset molecule is C grey, O red, and H white. The reference potential energy profile (DFT) is shown in black. Each profile is shifted such that each model's lowest energy is zero. The histograms demonstrate the distribution of dihedral angles and O-H distances in the training data. Shaded areas denote standard deviations across five independent runs.
  • ...and 1 more figures

Theorems & Definitions (16)

  • Proposition 4.1
  • Proposition 4.2
  • Lemma C.1
  • proof
  • Proposition C.2
  • proof
  • Proposition C.3
  • proof
  • proof : Proof of Proposition \ref{['prop:ictp_equivariance']}
  • Proposition D.1
  • ...and 6 more