Superior Molecular Representations from Intermediate Encoder Layers
Luis Pinto
TL;DR
This work questions the default reliance on final-layer embeddings from pretrained molecular encoders and demonstrates, across five architectures and 22 ADMET tasks, that intermediate layers often provide more transferable features. It introduces two label-free probes to map information flow and evaluates frozen embeddings with a lightweight surrogate model, complemented by finetuning experiments with layer-restricted encoders. The results show intermediate representations yield average improvements of 5.4% for frozen embeddings and 8.5% when finetuned to intermediate depths, with several tasks achieving new state-of-the-art performance. The findings advocate for a two-stage, depth-wise exploration of molecular encoders to improve predictive accuracy and computational efficiency in cheminformatics applications.
Abstract
Pretrained molecular encoders have become indispensable in computational chemistry for tasks such as property prediction and molecular generation. However, the standard practice of relying solely on final-layer embeddings for downstream tasks may discard valuable information. In this work, we first analyze the information flow in five diverse molecular encoders and find that intermediate layers retain more general-purpose features, whereas the final-layer specializes and compresses information. We then perform an empirical layer-wise evaluation across 22 property prediction tasks. We find that using frozen embeddings from optimal intermediate layers improves downstream performance by an average of 5.4%, up to 28.6%, compared to the final-layer. Furthermore, finetuning encoders truncated at intermediate depths achieves even greater average improvements of 8.5%, with increases as high as 40.8%, obtaining new state-of-the-art results on several benchmarks. These findings highlight the importance of exploring the full representational depth of molecular encoders to achieve substantial performance improvements and computational efficiency. The code will be made publicly available.
