Table of Contents
Fetching ...

Superior Molecular Representations from Intermediate Encoder Layers

Luis Pinto

TL;DR

This work questions the default reliance on final-layer embeddings from pretrained molecular encoders and demonstrates, across five architectures and 22 ADMET tasks, that intermediate layers often provide more transferable features. It introduces two label-free probes to map information flow and evaluates frozen embeddings with a lightweight surrogate model, complemented by finetuning experiments with layer-restricted encoders. The results show intermediate representations yield average improvements of 5.4% for frozen embeddings and 8.5% when finetuned to intermediate depths, with several tasks achieving new state-of-the-art performance. The findings advocate for a two-stage, depth-wise exploration of molecular encoders to improve predictive accuracy and computational efficiency in cheminformatics applications.

Abstract

Pretrained molecular encoders have become indispensable in computational chemistry for tasks such as property prediction and molecular generation. However, the standard practice of relying solely on final-layer embeddings for downstream tasks may discard valuable information. In this work, we first analyze the information flow in five diverse molecular encoders and find that intermediate layers retain more general-purpose features, whereas the final-layer specializes and compresses information. We then perform an empirical layer-wise evaluation across 22 property prediction tasks. We find that using frozen embeddings from optimal intermediate layers improves downstream performance by an average of 5.4%, up to 28.6%, compared to the final-layer. Furthermore, finetuning encoders truncated at intermediate depths achieves even greater average improvements of 8.5%, with increases as high as 40.8%, obtaining new state-of-the-art results on several benchmarks. These findings highlight the importance of exploring the full representational depth of molecular encoders to achieve substantial performance improvements and computational efficiency. The code will be made publicly available.

Superior Molecular Representations from Intermediate Encoder Layers

TL;DR

This work questions the default reliance on final-layer embeddings from pretrained molecular encoders and demonstrates, across five architectures and 22 ADMET tasks, that intermediate layers often provide more transferable features. It introduces two label-free probes to map information flow and evaluates frozen embeddings with a lightweight surrogate model, complemented by finetuning experiments with layer-restricted encoders. The results show intermediate representations yield average improvements of 5.4% for frozen embeddings and 8.5% when finetuned to intermediate depths, with several tasks achieving new state-of-the-art performance. The findings advocate for a two-stage, depth-wise exploration of molecular encoders to improve predictive accuracy and computational efficiency in cheminformatics applications.

Abstract

Pretrained molecular encoders have become indispensable in computational chemistry for tasks such as property prediction and molecular generation. However, the standard practice of relying solely on final-layer embeddings for downstream tasks may discard valuable information. In this work, we first analyze the information flow in five diverse molecular encoders and find that intermediate layers retain more general-purpose features, whereas the final-layer specializes and compresses information. We then perform an empirical layer-wise evaluation across 22 property prediction tasks. We find that using frozen embeddings from optimal intermediate layers improves downstream performance by an average of 5.4%, up to 28.6%, compared to the final-layer. Furthermore, finetuning encoders truncated at intermediate depths achieves even greater average improvements of 8.5%, with increases as high as 40.8%, obtaining new state-of-the-art results on several benchmarks. These findings highlight the importance of exploring the full representational depth of molecular encoders to achieve substantial performance improvements and computational efficiency. The code will be made publicly available.

Paper Structure

This paper contains 27 sections, 3 equations, 82 figures, 1 table.

Figures (82)

  • Figure 1: Left: Tokenized-molecule entropy rises with depth but falls at the final block for MolFormer, Uni-Mol and PosEGNN, indicating compression; Orb models maintain higher spread. Right: Adjacent-layer CKA shows small changes across interior layers and a pronounced last-step change for most models. Depth is normalized from first encoder block (0%) to last (100%).
  • Figure 2: Percentage improvement in test metric of the best intermediate layer relative to the final-layer when evaluating frozen embeddings. Positive values mean the best non-final layer improves over the final-layer. Negative values mean the final-layer outperforms the best non-final layer.
  • Figure 3: Percentage change in test metric achieved by finetuning up to the best intermediate layer compared to finetuning up to the final-layer. Positive values mean the best non-final layer improves over the final-layer. Negative values mean the final-layer outperforms the best non-final layer.
  • Figure 4: Left: Example of scatter plot of frozen embedding AUCPR vs. finetuned AUCPR for each MolFormer layer on task cyp2c9-veith. Each point is annotated with its corresponding layer number. Right: Histogram of embedding-to-finetuned correlations for all 110 model-task combinations.
  • Figure 5: Comparison of CatBoost and TabPFN across a suite of ADMET benchmarks. The vertical dashed line separates the MAE‐based tasks on the left (lower is better) from the remaining tasks on the right (higher is better). To display ppbr‐az on a 0–1 scale, its MAE was divided by 10.
  • ...and 77 more figures