Table of Contents
Fetching ...

Large Language-Geometry Model: When LLM meets Equivariance

Zongzhao Li, Jiacheng Cen, Bing Su, Wenbing Huang, Tingyang Xu, Yu Rong, Deli Zhao

TL;DR

This work tackles predicting 3D structures and dynamics while preserving $\mathrm{E}(3)$-equivariance by fusing Large Language Models with geometry-aware, equivariant graph processing in EquiLLM. The framework uses geometry-aware prompting to guide a frozen LLM as an invariant feature processor, while a lightweight Equivariant Encoder and Adaptor handle all directional 3D reasoning, ensuring $\mathrm{E}(3)$-equivariance throughout. Key contributions include a three-component architecture, task-specific geometry-aware prompts (task description, object features, statistics), and comprehensive validation on molecular dynamics (MD17), human motion, and antibody design (RAbD), achieving state-of-the-art results across several metrics. The approach demonstrates strong knowledge integration and generalizability for 3D physical tasks, reducing training costs by avoiding fine-tuning of the LLM and enabling broader scientific applications through modular design.

Abstract

Accurately predicting 3D structures and dynamics of physical systems is crucial in scientific applications. Existing approaches that rely on geometric Graph Neural Networks (GNNs) effectively enforce $\mathrm{E}(3)$-equivariance, but they often fall in leveraging extensive broader information. While direct application of Large Language Models (LLMs) can incorporate external knowledge, they lack the capability for spatial reasoning with guaranteed equivariance. In this paper, we propose EquiLLM, a novel framework for representing 3D physical systems that seamlessly integrates E(3)-equivariance with LLM capabilities. Specifically, EquiLLM comprises four key components: geometry-aware prompting, an equivariant encoder, an LLM, and an equivariant adaptor. Essentially, the LLM guided by the instructive prompt serves as a sophisticated invariant feature processor, while 3D directional information is exclusively handled by the equivariant encoder and adaptor modules. Experimental results demonstrate that EquiLLM delivers significant improvements over previous methods across molecular dynamics simulation, human motion simulation, and antibody design, highlighting its promising generalizability.

Large Language-Geometry Model: When LLM meets Equivariance

TL;DR

This work tackles predicting 3D structures and dynamics while preserving -equivariance by fusing Large Language Models with geometry-aware, equivariant graph processing in EquiLLM. The framework uses geometry-aware prompting to guide a frozen LLM as an invariant feature processor, while a lightweight Equivariant Encoder and Adaptor handle all directional 3D reasoning, ensuring -equivariance throughout. Key contributions include a three-component architecture, task-specific geometry-aware prompts (task description, object features, statistics), and comprehensive validation on molecular dynamics (MD17), human motion, and antibody design (RAbD), achieving state-of-the-art results across several metrics. The approach demonstrates strong knowledge integration and generalizability for 3D physical tasks, reducing training costs by avoiding fine-tuning of the LLM and enabling broader scientific applications through modular design.

Abstract

Accurately predicting 3D structures and dynamics of physical systems is crucial in scientific applications. Existing approaches that rely on geometric Graph Neural Networks (GNNs) effectively enforce -equivariance, but they often fall in leveraging extensive broader information. While direct application of Large Language Models (LLMs) can incorporate external knowledge, they lack the capability for spatial reasoning with guaranteed equivariance. In this paper, we propose EquiLLM, a novel framework for representing 3D physical systems that seamlessly integrates E(3)-equivariance with LLM capabilities. Specifically, EquiLLM comprises four key components: geometry-aware prompting, an equivariant encoder, an LLM, and an equivariant adaptor. Essentially, the LLM guided by the instructive prompt serves as a sophisticated invariant feature processor, while 3D directional information is exclusively handled by the equivariant encoder and adaptor modules. Experimental results demonstrate that EquiLLM delivers significant improvements over previous methods across molecular dynamics simulation, human motion simulation, and antibody design, highlighting its promising generalizability.

Paper Structure

This paper contains 15 sections, 8 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: The overall framework of EquiLLM. Given a geometric graph ${\mathcal{G}} = ({\mathcal{V}},{\mathcal{E}})$ as input, EquiLLM initially employs an Equivariant Encoder to derive processed features $\Vec{{\bm{X}}}^{'}$ and ${\bm{H}}^{'}$. The features ${\bm{H}}^{'}$ are first projected through a projector, then concatenated with prompt features ${\bm{P}}$ in a task-specific manner. This concatenated vector is subsequently fed into an LLM. The output features ${\bm{H}}^{\text{llm}}$ from the LLM, alongside the previously obtained processed features $\Vec{{\bm{X}}}^{'}$ and ${\bm{H}}^{'}$, are then passed into an Equivariant Adapter. The Equivariant Adapter then generates the final outputs, including the vector $\Vec{{\bm{X}}}^{\text{out}}$ for equivariant tasks and the feature ${\bm{H}}^{\text{out}}$ for invariant tasks. The blue module means the invariant module, while the purple module means the equivariant module.
  • Figure 2: The visualization of the predicted structures across various methods on Toluene of the MD17 dataset, where the pink represents the ground-truth structure.