Table of Contents
Fetching ...

nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder

Maksim Kuznetsov, Airat Valiev, Alex Aliper, Daniil Polykovskiy, Elena Tutubalina, Rim Shayakhmetov, Zulfat Miftahutdinov

TL;DR

nach0-pc tackles the challenge of generating 3D molecular structures by fusing a domain-specific molecular point cloud encoder with an encoder-decoder language model. It represents spatial atom arrangements as point clouds and a SMILES+XYZ textual format, enabling end-to-end conditioning on text and spatial inputs. A BRICS-based, whole-fragment dropout pre-training scheme distills knowledge from unlabeled 3D structures, improving downstream distribution learning and conformation tasks. Across six spatial molecular generation tasks, nach0-pc achieves competitive results with diffusion baselines while offering multi-task capability and reduced training/inference time. This framework advances efficient, geometry-aware drug design by unifying 3D structure generation with language-model conditioning.

Abstract

Recent advancements have integrated Language Models (LMs) into a drug discovery pipeline. However, existing models mostly work with SMILES and SELFIES chemical string representations, which lack spatial features vital for drug discovery. Additionally, attempts to translate chemical 3D structures into text format encounter issues such as excessive length and insufficient atom connectivity information. To address these issues, we introduce nach0-pc, a model combining domain-specific encoder and textual representation to handle spatial arrangement of atoms effectively. Our approach utilizes a molecular point cloud encoder for concise and order-invariant structure representation. We introduce a novel pre-training scheme for molecular point clouds to distillate the knowledge from spatial molecular structures datasets. After fine-tuning within both single-task and multi-task frameworks, nach0-pc demonstrates performance comparable with other diffusion models in terms of generated samples quality across several established spatial molecular generation tasks. Notably, our model is a multi-task approach, in contrast to diffusion models being limited to single tasks. Additionally, it is capable of processing point cloud-related data, which language models are not capable of handling due to memory limitations. These lead to our model having reduced training and inference time while maintaining on par performance.

nach0-pc: Multi-task Language Model with Molecular Point Cloud Encoder

TL;DR

nach0-pc tackles the challenge of generating 3D molecular structures by fusing a domain-specific molecular point cloud encoder with an encoder-decoder language model. It represents spatial atom arrangements as point clouds and a SMILES+XYZ textual format, enabling end-to-end conditioning on text and spatial inputs. A BRICS-based, whole-fragment dropout pre-training scheme distills knowledge from unlabeled 3D structures, improving downstream distribution learning and conformation tasks. Across six spatial molecular generation tasks, nach0-pc achieves competitive results with diffusion baselines while offering multi-task capability and reduced training/inference time. This framework advances efficient, geometry-aware drug design by unifying 3D structure generation with language-model conditioning.

Abstract

Recent advancements have integrated Language Models (LMs) into a drug discovery pipeline. However, existing models mostly work with SMILES and SELFIES chemical string representations, which lack spatial features vital for drug discovery. Additionally, attempts to translate chemical 3D structures into text format encounter issues such as excessive length and insufficient atom connectivity information. To address these issues, we introduce nach0-pc, a model combining domain-specific encoder and textual representation to handle spatial arrangement of atoms effectively. Our approach utilizes a molecular point cloud encoder for concise and order-invariant structure representation. We introduce a novel pre-training scheme for molecular point clouds to distillate the knowledge from spatial molecular structures datasets. After fine-tuning within both single-task and multi-task frameworks, nach0-pc demonstrates performance comparable with other diffusion models in terms of generated samples quality across several established spatial molecular generation tasks. Notably, our model is a multi-task approach, in contrast to diffusion models being limited to single tasks. Additionally, it is capable of processing point cloud-related data, which language models are not capable of handling due to memory limitations. These lead to our model having reduced training and inference time while maintaining on par performance.

Paper Structure

This paper contains 52 sections, 3 equations, 10 figures, 11 tables, 1 algorithm.

Figures (10)

  • Figure 1: Two diagrams of tasks and nach0-pc. (a) Three types of tasks are considered: text→text tasks in yellow; molecular point cloud + text→text in green; and molecular/protein point cloud + text→text in blue. (b) Every spatial molecular generation task we consider is cast as feeding our model text and/or molecular point cloud as input and training it to generate spatial molecular structures as output text.
  • Figure 2: An overview of the encoder architecture adapted for point cloud data and standard text input. For point clouds, tokens represent features at specific spatial positions. Tokens are embedded via a token embedding layer, followed by summation pooling to optimize memory and processing efficiency. Scalar Sinusoidal Embeddings (SSE) integrate continuous spatial coordinates and relative pairwise distances.
  • Figure 3: The pre-training scheme for 3D molecular structures datasets. The model learns to reconstruct blurred or masked molecular fragments. The model generates the missing parts during pre-training, including their SMILES representations, attachment points, and atom coordinates.
  • Figure 4: Samples for (a) spatial distribution learning, (b) conformation generation and (c) linker design tasks.
  • Figure 5: (left) Generated molecular structures and (right) structural/shape similarity trade-off for various noise parameters on shape-conditioned generation task.
  • ...and 5 more figures