Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

Joshua Steier

Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

Joshua Steier

TL;DR

This work introduces Composition Projection Decomposition (CPD), which uses QR projection to linearly remove composition signal from learned representations and probes the geometric residual, and recommends linear probes as the primary metric.

Abstract

What do atomistic foundation models encode in their intermediate representations, and how is that information organized? We introduce Composition Projection Decomposition (CPD), which uses QR projection to linearly remove composition signal from learned representations and probes the geometric residual. Across eight models from five architectural families on QM9 molecules and Materials Project crystals, we find a disentanglement gradient: tensor product equivariant architectures (MACE) produce representations where geometry is almost fully linearly accessible after composition removal ($R^2_{\text{geom}} = 0.782$ for HOMO-LUMO gap), while handcrafted descriptors (ANI-2x) entangle the same information nonlinearly ($R^2_{\text{geom}} = -0.792$ under Ridge; $R^2 = +0.784$ under MLP). MACE routes target-specific signal through irreducible representation channels -- dipole to $L = 1$, HOMO-LUMO gap to $L = 0$ -- a pattern not observed in ViSNet's vector-scalar architecture under the same probe. We show that gradient boosted tree probes on projected residuals are systematically inflated, recovering $R^2 = 0.68$--$0.95$ on a purely compositional target, and recommend linear probes as the primary metric. Linearly disentangled representations are more sample-efficient under linear probing, suggesting a practical advantage for equivariant architectures beyond raw prediction accuracy.

Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

TL;DR

Abstract

for HOMO-LUMO gap), while handcrafted descriptors (ANI-2x) entangle the same information nonlinearly (

under Ridge;

under MLP). MACE routes target-specific signal through irreducible representation channels -- dipole to

, HOMO-LUMO gap to

-- a pattern not observed in ViSNet's vector-scalar architecture under the same probe. We show that gradient boosted tree probes on projected residuals are systematically inflated, recovering

on a purely compositional target, and recommend linear probes as the primary metric. Linearly disentangled representations are more sample-efficient under linear probing, suggesting a practical advantage for equivariant architectures beyond raw prediction accuracy.

Paper Structure (24 sections, 9 figures, 3 tables)

This paper contains 24 sections, 9 figures, 3 tables.

1 Introduction
2 Related Work
3 Methods
4 Results
5 Deep Analysis
6 Discussion
7 Conclusion
Reproducibility Statement
Compute Resources
Broader Impact

Figures (9)

Figure 1: Composition Projection Decomposition (CPD). A foundation model produces representation $\mathbf{X}$; QR projection on composition features $\mathbf{Z}$ splits $\mathbf{X}$ into a composition component $\mathbf{X}_{\text{comp}} = \mathbf{QQ}^{\top}\mathbf{X}$ and a geometric residual $\mathbf{X}_{\text{geom}} = (\mathbf{I} - \mathbf{QQ}^{\top})\mathbf{X}$. Each component is probed separately with Ridge regression. A positive control (probing for average atomic mass on the residual) validates that composition signal has been removed.
Figure 2: Disentanglement gradient across five QM9 models. $R^2_{\text{geom}}$ (Ridge on CPD residual) for HOMO-LUMO gap. Three tiers emerge: tensor product equivariant (MACE), intermediate (ViSNet, SchNet), and entangled (DimeNet++, ANI-2x). Error bars show standard deviation over 30 repeated 5-fold CV runs.
Figure 3: DimeNet++ depth profile. $R^2_{\text{geom}}$ starts strongly negative at the embedding layer and climbs through each interaction block, crossing into positive territory only at the final block. Main split only.
Figure 4: MACE depth profile. $R^2_{\text{geom}}$ is non-monotonic: scalar projection layers (out1) reduce access while tensor product layers partially restore it. Main split only.
Figure 5: ANI-2x encodes the same information in a nonlinearly entangled form. Ridge (linear) probe on the CPD residual collapses to strongly negative $R^2$ for all four properties, while an MLP (nonlinear) probe recovers strong positive scores. Main split.
...and 4 more figures

Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

TL;DR

Abstract

Information Routing in Atomistic Foundation Models: How Equivariance Creates Linearly Disentangled Representations

Authors

TL;DR

Abstract

Table of Contents

Figures (9)