Exploiting Latent Linearity in LLMs Improves Explainable Molecular Representation Learning
Zhuoran Li, Xu Sun, Wanyu Lin, Jiannong Cao
TL;DR
This work tackles the explainability gap in LLM-driven molecular property prediction by revealing latent linearity in LLM representations. It introduces MoleX, which decomposes embeddings into a concept space rooted in functional groups and learns a linear predictor augmented by residual calibration, yielding accurate and chemically faithful explanations. The approach achieves state-of-the-art performance across multiple benchmarks while enabling fast CPU inference and dramatically reducing parameter counts. These findings demonstrate that aligning foundation models with domain concepts can enhance both predictive power and mechanistic interpretability for scientific applications.
Abstract
Large language models (LLMs) have demonstrated broad utility across molecular domains, spanning drug discovery and materials design. Analyzing LLMs' latent representations is crucial for elucidating their underlying mechanisms, improving explainability, and ultimately advancing downstream performance. We propose MoleX, a simple yet effective framework that decomposes molecular embeddings within LLM representations into a concept-aligned space for explainable molecular representation learning. We further show that these high-dimensional embeddings admit a linear mapping onto chemically consistent concepts. Our analysis suggests that the uncovered linearity aligns with established chemical principles, indicating a mechanistically explainable latent structure in LLM representations for scientific applications. When applied to downstream tasks, this latent linearity improves both predictive and explanatory performance. Extensive experiments demonstrate that MoleX outperforms existing approaches in accuracy, explainability, and efficiency, achieving CPU inference on large-scale datasets 300 times faster with 100,000 fewer parameters than LLMs.
