Table of Contents
Fetching ...

GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining

Shaoheng Yan, Zian Li, Muhan Zhang

TL;DR

GeoRecon introduces a graph-level reconstruction objective for 3D molecular pretraining, conditioning geometry reconstruction on a global graph embedding to capture emergent molecular structure. The method yields significantly smoother latent spaces and improves downstream graph-level predictions on QM9, MD17, and MD22, with notable gains in energy and force estimates and robustness to out-of-distribution data. The authors provide Lipschitz-based theory and extensive ablations showing the importance of reconstruction noise scale and decoder depth. GeoRecon is compatible with SE(3)-equivariant backbones, relies only on 3D coordinates, and demonstrates transferability to existing models like UniMol, suggesting broad applicability for sample-efficient, geometry-aware molecular learning.

Abstract

The pretraining-finetuning paradigm has powered major advances in domains such as natural language processing and computer vision, with representative examples including masked language modeling and next-token prediction. In molecular representation learning, however, pretraining tasks remain largely restricted to node-level denoising, which effectively captures local atomic environments but is often insufficient for encoding the global molecular structure critical to graph-level property prediction tasks such as energy estimation and molecular regression. To address this gap, we introduce GeoRecon, a graph-level pretraining framework that shifts the focus from individual atoms to the molecule as an integrated whole. GeoRecon formulates a graph-level reconstruction task: during pretraining, the model is trained to produce an informative graph representation that guides geometry reconstruction while inducing smoother and more transferable latent spaces. This encourages the learning of coherent, global structural features beyond isolated atomic details. Without relying on external supervision, GeoRecon generally improves over backbone baselines on multiple molecular benchmarks including QM9, MD17, MD22, and 3BPA, demonstrating the effectiveness of graph-level reconstruction for holistic and geometry-aware molecular embeddings.

GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining

TL;DR

GeoRecon introduces a graph-level reconstruction objective for 3D molecular pretraining, conditioning geometry reconstruction on a global graph embedding to capture emergent molecular structure. The method yields significantly smoother latent spaces and improves downstream graph-level predictions on QM9, MD17, and MD22, with notable gains in energy and force estimates and robustness to out-of-distribution data. The authors provide Lipschitz-based theory and extensive ablations showing the importance of reconstruction noise scale and decoder depth. GeoRecon is compatible with SE(3)-equivariant backbones, relies only on 3D coordinates, and demonstrates transferability to existing models like UniMol, suggesting broad applicability for sample-efficient, geometry-aware molecular learning.

Abstract

The pretraining-finetuning paradigm has powered major advances in domains such as natural language processing and computer vision, with representative examples including masked language modeling and next-token prediction. In molecular representation learning, however, pretraining tasks remain largely restricted to node-level denoising, which effectively captures local atomic environments but is often insufficient for encoding the global molecular structure critical to graph-level property prediction tasks such as energy estimation and molecular regression. To address this gap, we introduce GeoRecon, a graph-level pretraining framework that shifts the focus from individual atoms to the molecule as an integrated whole. GeoRecon formulates a graph-level reconstruction task: during pretraining, the model is trained to produce an informative graph representation that guides geometry reconstruction while inducing smoother and more transferable latent spaces. This encourages the learning of coherent, global structural features beyond isolated atomic details. Without relying on external supervision, GeoRecon generally improves over backbone baselines on multiple molecular benchmarks including QM9, MD17, MD22, and 3BPA, demonstrating the effectiveness of graph-level reconstruction for holistic and geometry-aware molecular embeddings.

Paper Structure

This paper contains 38 sections, 4 theorems, 17 equations, 3 figures, 13 tables.

Key Result

Theorem 1

$\bm{r}$ denotes the equilibrium molecular structure, $\widetilde{\bm{r}}$ is its perturbed version obtained by Gaussian corruption, and $\theta$ represents the parameters of the denoising network. $q_\sigma(\widetilde{\bm{r}} \mid \bm{r})$ is defined as a Gaussian centered at $\bm{r}$, serving as a

Figures (3)

  • Figure 1: Overview of the GeoRecon framework. Given a molecular structure with atom types and 3D coordinates, the model encodes it using SE(3)-equivariant attention. Besides the standard node-level denoising objective, GeoRecon feeds a pooled graph-level representation concatenated with node embeddings derived from noisy coordinates into a lightweight decoder to reconstruct the scaled noise. The pretrained encoder is then finetuned for downstream molecular property prediction tasks.
  • Figure 2: Representation stability of GeoRecon (upper row) vs. Coord (lower row) near equilibrium conformations ($\|\delta \bm{x}\| \leq 1$ Å). Left: averages over multiple molecules. Middle & Right: 2D perturbation heatmaps for a randomly selected PCQM4Mv2 sample, where the horizontal and vertical axes correspond to the magnitudes of two coordinate perturbations applied to a random atom, and the color encodes the norm change of the representation ($\|\delta\bm{h}\|$). Larger blue regions indicate higher stability of the representation near the equilibrium conformation. The red curve in the middle column highlights $\delta y=0$ ($\|\delta\bm{h}\|$ along the $x$-axis). Scale bars: shared on the left (left column) and at the bottom (middle).
  • Figure 3: MAE ($\downarrow$) loss curves from linear probing experiments on multiple QM9 tasks. A cosine–warmup learning rate schedule is applied, and both models are trained for 14.7k steps.

Theorems & Definitions (7)

  • Theorem 1: Equivalence between denoising and force field prediction Zaidi2023
  • Theorem 2: Smoother representations tighten generalization for linear probes
  • proof : Sketch
  • Lemma 1: Noise robustness scaling with $L_f$ under a linear probe
  • proof : Sketch
  • Proposition 1: Generalization and robustness with spectrally-controlled readouts
  • proof : Sketch