Table of Contents
Fetching ...

FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields

Shihao Shao, Haoran Geng, Zun Wang, Qinghua Cui

TL;DR

The paper tackles the limited design space and computational burden of Clebsch-Gordan transforms in permutation-equivariant ML force fields. It introduces FreeCG, which performs CG transforms on permutation-invariant abstract edges and augments them with group CG transforms, sparse paths, abstract-edge shuffling, and an attention enhancer to achieve high expressivity with efficiency. Empirically, FreeCG sets state-of-the-art results on MD17, rMD17, and MD22 for force prediction and improves QM9 property predictions by substantial margins, while maintaining practical speed and memory usage. The work proposes a general paradigm shift for CG-transform design in geometric neural networks and demonstrates its applicability beyond FreeCG by enhancing QuinNet, underscoring broad potential for future geometric model development.

Abstract

Machine Learning Force Fields (MLFFs) are of great importance for chemistry, physics, materials science, and many other related fields. The Clebsch-Gordan Transform (CG transform) effectively encodes many-body interactions and is thus an important building block for many models of MLFFs. However, the permutation-equivariance requirement of MLFFs limits the design space of CG transform, that is, intensive CG transform has to be conducted for each neighboring edge and the operations should be performed in the same manner for all edges. This constraint results in reduced expressiveness of the model while simultaneously increasing computational demands. To overcome this challenge, we first implement the CG transform layer on the permutation-invariant abstract edges generated from real edge information. We show that this approach allows complete freedom in the design of the layer without compromising the crucial symmetry. Developing on this free design space, we further propose group CG transform with sparse path, abstract edges shuffling, and attention enhancer to form a powerful and efficient CG transform layer. Our method, known as FreeCG, achieves state-of-the-art (SOTA) results in force prediction for MD17, rMD17, MD22, and is well extended to property prediction in QM9 datasets with several improvements greater than 15% and the maximum beyond 20%. The extensive real-world applications showcase high practicality. FreeCG introduces a novel paradigm for carrying out efficient and expressive CG transform in future geometric neural network designs. To demonstrate this, the recent SOTA, QuinNet, is also enhanced under our paradigm. Code will be publicly available.

FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields

TL;DR

The paper tackles the limited design space and computational burden of Clebsch-Gordan transforms in permutation-equivariant ML force fields. It introduces FreeCG, which performs CG transforms on permutation-invariant abstract edges and augments them with group CG transforms, sparse paths, abstract-edge shuffling, and an attention enhancer to achieve high expressivity with efficiency. Empirically, FreeCG sets state-of-the-art results on MD17, rMD17, and MD22 for force prediction and improves QM9 property predictions by substantial margins, while maintaining practical speed and memory usage. The work proposes a general paradigm shift for CG-transform design in geometric neural networks and demonstrates its applicability beyond FreeCG by enhancing QuinNet, underscoring broad potential for future geometric model development.

Abstract

Machine Learning Force Fields (MLFFs) are of great importance for chemistry, physics, materials science, and many other related fields. The Clebsch-Gordan Transform (CG transform) effectively encodes many-body interactions and is thus an important building block for many models of MLFFs. However, the permutation-equivariance requirement of MLFFs limits the design space of CG transform, that is, intensive CG transform has to be conducted for each neighboring edge and the operations should be performed in the same manner for all edges. This constraint results in reduced expressiveness of the model while simultaneously increasing computational demands. To overcome this challenge, we first implement the CG transform layer on the permutation-invariant abstract edges generated from real edge information. We show that this approach allows complete freedom in the design of the layer without compromising the crucial symmetry. Developing on this free design space, we further propose group CG transform with sparse path, abstract edges shuffling, and attention enhancer to form a powerful and efficient CG transform layer. Our method, known as FreeCG, achieves state-of-the-art (SOTA) results in force prediction for MD17, rMD17, MD22, and is well extended to property prediction in QM9 datasets with several improvements greater than 15% and the maximum beyond 20%. The extensive real-world applications showcase high practicality. FreeCG introduces a novel paradigm for carrying out efficient and expressive CG transform in future geometric neural network designs. To demonstrate this, the recent SOTA, QuinNet, is also enhanced under our paradigm. Code will be publicly available.
Paper Structure (20 sections, 27 equations, 11 figures, 8 tables)

This paper contains 20 sections, 27 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: The architecture of a single layer of FreeCG. The cross-attention mechanism generates abstract edges through a permutation-invariant process. The abstract edges are also used to enhance the quality of the attention score, denoted as Attention Enhancer. In the right part, the Group CG transform organizes abstract edges into groups and performs the CG transform on each group. We adopt sparse path for CG transform, enabling lower computation demands while maintaining O(3) equivariance. Abstract edges shuffling improves the information exchange between different irreps. The details for sparse path and abstract edges shuffling can be referred to Fig. \ref{['fig:spandaes']}.
  • Figure 2: Details on sparse path and abstract edges shuffling. Left: The sparse path holds two useful properties: 1) The number of paths is less than the weaker SO(3) equivariance (4 vs. 8). 2) Each output irreps contains the information from input ones with both degree $l=1$ and $l=2$. Right: The shuffling strategy is to add a constant $k$ for the index of each abstract edge. The shuffled result is then added by $\hat{E}^L_i$, and get the final added value $d\overline{E}^{L+1}_i$.
  • Figure 3: The speed and memory occupation of FreeCG compared with other SoTA models. Numbers are reported based on a single chignolin molecule. The right three models are based on high-order irreps and CG transform.
  • Figure 4: Efficiency analysis of group CG transform. Left: The number of paths for CG transform under different group numbers, where the numbers of irreps are the same. Right: The actual running time for CG transform for different group numers. Here we adopt sparse path strategy for computing 512 irreps (before grouping) for each $l$. Full CG transform denotes not using sparse path.
  • Figure 5: Applications for the 166-atom mini-protein, Chignolin. a. The energy landscape of Chignolin was sampled using Replica Exchange Molecular Dynamics (REMD). This landscape is characterized by two key distance parameters: the x-axis represents the distance between the carbonyl oxygen on the D3 backbone and the nitrogen on the G7 backbone, while the y-axis depicts the distance between the carbonyl oxygen on the E5 backbone and the nitrogen on the T8 backbone. These two distance metrics collectively illustrate the conformational states of Chignolin across its energy landscape. The left and right energy basins are corresponded to folded and unfolded states, respectively. b. The six conformations are sampled at the localization highlighted in the energy landscape. The force and energy performances (kcal/mol) are reported, with a comparison made to ViSNet. These six conformations cover both folded and unfolded states. c. The RMSD ($\textup{\r{A}}$) during the molecular dynamics simulation. The shaded area denotes the values of standard derivations. The RMSD values are obtained by taking average of 10 trajectories.
  • ...and 6 more figures