Table of Contents
Fetching ...

InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames

Haorui Li, Weitao Du, Yuqiang Li, Hongyu Guo, Shengchao Liu

TL;DR

InertialAR reframes 3D molecule generation as a transformer-based autoregressive task by introducing a canonical tokenization that aligns molecules to an inertial frame and deterministically orders atoms, thereby achieving $SE(3)$ and permutation invariance. It further equips the Transformer with GeoRoPE, a geometry-aware attention mechanism that combines RoPE-3D with Nyström distance encoding to capture relative geometry and pairwise distances. A hierarchical autoregressive architecture then predicts atom types and coordinates, using cross-entropy for discrete types and diffusion loss for continuous coordinates, with classifier-free guidance enabling controllable generation. Empirically, InertialAR achieves state-of-the-art results on multiple unconditional benchmarks (including QM9, GEOM-Drugs, and B3LYP) and demonstrates strong controllable generation and editing capabilities, highlighting its potential as a scalable foundation model for 3D molecular design and beyond.

Abstract

Transformer-based autoregressive models have emerged as a unifying paradigm across modalities such as text and images, but their extension to 3D molecule generation remains underexplored. The gap stems from two fundamental challenges: (1) tokenizing molecules into a canonical 1D sequence of tokens that is invariant to both SE(3) transformations and atom index permutations, and (2) designing an architecture capable of modeling hybrid atom-based tokens that couple discrete atom types with continuous 3D coordinates. To address these challenges, we introduce InertialAR. InertialAR devises a canonical tokenization that aligns molecules to their inertial frames and reorders atoms to ensure SE(3) and permutation invariance. Moreover, InertialAR equips the attention mechanism with geometric awareness via geometric rotary positional encoding (GeoRoPE). In addition, it utilizes a hierarchical autoregressive paradigm to predict the next atom-based token, predicting the atom type first and then its 3D coordinates via Diffusion loss. Experimentally, InertialAR achieves state-of-the-art performance on 7 of the 10 evaluation metrics for unconditional molecule generation across QM9, GEOM-Drugs, and B3LYP. Moreover, it significantly outperforms strong baselines in controllable generation for targeted chemical functionality, attaining state-of-the-art results across all 5 metrics.

InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames

TL;DR

InertialAR reframes 3D molecule generation as a transformer-based autoregressive task by introducing a canonical tokenization that aligns molecules to an inertial frame and deterministically orders atoms, thereby achieving and permutation invariance. It further equips the Transformer with GeoRoPE, a geometry-aware attention mechanism that combines RoPE-3D with Nyström distance encoding to capture relative geometry and pairwise distances. A hierarchical autoregressive architecture then predicts atom types and coordinates, using cross-entropy for discrete types and diffusion loss for continuous coordinates, with classifier-free guidance enabling controllable generation. Empirically, InertialAR achieves state-of-the-art results on multiple unconditional benchmarks (including QM9, GEOM-Drugs, and B3LYP) and demonstrates strong controllable generation and editing capabilities, highlighting its potential as a scalable foundation model for 3D molecular design and beyond.

Abstract

Transformer-based autoregressive models have emerged as a unifying paradigm across modalities such as text and images, but their extension to 3D molecule generation remains underexplored. The gap stems from two fundamental challenges: (1) tokenizing molecules into a canonical 1D sequence of tokens that is invariant to both SE(3) transformations and atom index permutations, and (2) designing an architecture capable of modeling hybrid atom-based tokens that couple discrete atom types with continuous 3D coordinates. To address these challenges, we introduce InertialAR. InertialAR devises a canonical tokenization that aligns molecules to their inertial frames and reorders atoms to ensure SE(3) and permutation invariance. Moreover, InertialAR equips the attention mechanism with geometric awareness via geometric rotary positional encoding (GeoRoPE). In addition, it utilizes a hierarchical autoregressive paradigm to predict the next atom-based token, predicting the atom type first and then its 3D coordinates via Diffusion loss. Experimentally, InertialAR achieves state-of-the-art performance on 7 of the 10 evaluation metrics for unconditional molecule generation across QM9, GEOM-Drugs, and B3LYP. Moreover, it significantly outperforms strong baselines in controllable generation for targeted chemical functionality, attaining state-of-the-art results across all 5 metrics.

Paper Structure

This paper contains 37 sections, 2 theorems, 64 equations, 8 figures, 3 tables.

Key Result

Theorem 1

For an inertial frame $F$, we build up the corresponding right-handed axes as coordinate systems $Q$. Then we need to incorporate a fourth point that is not on the y-z plane or x-z plane to uniquely determine the directions of the coordinate system with one rotation transformation matrix.

Figures (8)

  • Figure 1: Overview of InertialAR: (a) canonical tokenization, (b) geometric rotary positional encoding (GeoRoPE), and (c) hierarchical autoregressive paradigm.
  • Figure 2: Illustration of introducing a fourth node as the anchor node. We define the sign of the x-y-z axis to make sure that ${\bm{x}}_4$ is in the first quadrant, and there are four cases as illustrated in the four subfigures.
  • Figure 3: Visualization of molecule editing by tuning the CFG guidance scale $s$.
  • Figure 4: Overview of mapping 3D molecules to their Molecule Class IDs.
  • Figure 5: Comparison of existing SE(3)-equivariant graph neural networks and InertialAR.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • proof
  • proof