Table of Contents
Fetching ...

Comparing the latent features of universal machine-learning interatomic potentials

Sofiia Chorna, Davide Tisi, Cesare Malosso, Wei Bin How, Michele Ceriotti, Sanggyu Chong

TL;DR

The paper systematically compares latent representations of four universal MLIPs by quantifying how their atomistic features can be linearly reconstructed across models and variants. Using global and local feature reconstruction errors (GFRE/LFRE) and PCovR projections, it reveals substantial cross-model divergence in latent spaces, with finer similarities in specific architectural variants and strong pretraining biases surviving fine-tuning. The work also demonstrates a principled way to synthesize local atomic features into informative structure-level descriptors via higher-order cumulants, highlighting that richer latent information emerges from large, diverse training data and multi-task or MoLE architectures. Overall, accuracy alone does not capture the diversity of learned representations, which has implications for model design, transfer learning, and robust deployment of uMLIPs.

Abstract

The past few years have seen the development of ``universal'' machine-learning interatomic potentials (uMLIPs) capable of approximating the ground-state potential energy surface across a wide range of chemical structures and compositions with reasonable accuracy. While these models differ in the architecture and the dataset used, they share the ability to compress a staggering amount of chemical information into descriptive latent features. Herein, we systematically analyze what the different uMLIPs have learned by quantitatively assessing the relative information content of their latent features with feature reconstruction errors as metrics, and observing how the trends are affected by the choice of training set and training protocol. We find that the uMLIPs encode chemical space in significantly distinct ways, with substantial cross-model feature reconstruction errors. When variants of the same model architecture are considered, trends become dependent on the dataset, target, and training protocol of choice. We also observe that fine-tuning of a uMLIP retains a strong pre-training bias in the latent features. Finally, we discuss how atom-level features, which are directly output by MLIPs, can be compressed into global structure-level features via concatenation of progressive cumulants, each adding significantly new information about the variability across the atomic environments within a given system.

Comparing the latent features of universal machine-learning interatomic potentials

TL;DR

The paper systematically compares latent representations of four universal MLIPs by quantifying how their atomistic features can be linearly reconstructed across models and variants. Using global and local feature reconstruction errors (GFRE/LFRE) and PCovR projections, it reveals substantial cross-model divergence in latent spaces, with finer similarities in specific architectural variants and strong pretraining biases surviving fine-tuning. The work also demonstrates a principled way to synthesize local atomic features into informative structure-level descriptors via higher-order cumulants, highlighting that richer latent information emerges from large, diverse training data and multi-task or MoLE architectures. Overall, accuracy alone does not capture the diversity of learned representations, which has implications for model design, transfer learning, and robust deployment of uMLIPs.

Abstract

The past few years have seen the development of ``universal'' machine-learning interatomic potentials (uMLIPs) capable of approximating the ground-state potential energy surface across a wide range of chemical structures and compositions with reasonable accuracy. While these models differ in the architecture and the dataset used, they share the ability to compress a staggering amount of chemical information into descriptive latent features. Herein, we systematically analyze what the different uMLIPs have learned by quantitatively assessing the relative information content of their latent features with feature reconstruction errors as metrics, and observing how the trends are affected by the choice of training set and training protocol. We find that the uMLIPs encode chemical space in significantly distinct ways, with substantial cross-model feature reconstruction errors. When variants of the same model architecture are considered, trends become dependent on the dataset, target, and training protocol of choice. We also observe that fine-tuning of a uMLIP retains a strong pre-training bias in the latent features. Finally, we discuss how atom-level features, which are directly output by MLIPs, can be compressed into global structure-level features via concatenation of progressive cumulants, each adding significantly new information about the variability across the atomic environments within a given system.

Paper Structure

This paper contains 30 sections, 16 equations, 18 figures, 8 tables.

Figures (18)

  • Figure 1: Heat maps of GFRE (left) and LFRE (right) for atomic last-layer latent features of MACE-MP-03b, PET-MAD, DPA-3.1, and UMA-S-1P1, computed with the atomic environments on the test split of the MAD dataset. Each cell represents the reconstruction error when mapping latent features of the "source" (row) to "target" (column).
  • Figure 2: Reconstruction errors among single-task MACE models trained on MPtrj (MACE-MP-0b3), OMat24, and MatPES (PBE and r2SCAN) datasets, evaluated on the MAD test MPTRAJMATPESomat24.
  • Figure 3: Reconstruction errors among different input tasks of UMA-S-1P1 trained on OMat24, OMOL, OC20, ODAC, and OMC omat24OMOLOC20ODACOMC25, evaluated on the MAD test.
  • Figure 4: Schematic overview of fine-tuning strategies for the PET model architecture: full fine-tuning, head fine-tuning, full transfer learning, and head transfer learning. GNN based on the Point Edge Transformer (PET) computes the backbone (BB) features (general, transferable node and edge representations). For a given prediction target, these BB features are fed into the readout head (MLPs) to generate the last-layer (LL) features (target-specific node and edge representations). The four illustrated strategies involve training different combinations of these components. The trainable parts corresponding to each strategy are colored.
  • Figure 5: Reconstruction errors (GFRE and LFRE) of last-layer atom-level features between PET checkpoints on the LPS dataset.
  • ...and 13 more figures