Deep peak property learning for efficient chiral molecules ECD spectra prediction
Hao Li, Da Long, Li Yuan, Yonghong Tian, Xinchang Wang, Fanyang Mo
TL;DR
This work tackles the costly prediction of electronic circular dichroism spectra for chiral molecules by introducing CMCDS, a large-scale dataset of computed ECD spectra for 22,190 molecules, and ECDFormer, a Transformer-based model that predicts peak properties (number, position, symbol) from a GeoGNN-derived molecular representation and renders the full ECD spectrum from those peaks. The peak-focused approach, with a dedicated loss and peak-specific metrics, yields superior accuracy on peak-number, peak-position, and peak-symbol predictions compared with traditional machine-learning and deep-learning baselines, while dramatically accelerating spectrum generation. The method enables rapid chiral-molecule assignation and high-throughput screening, with potential impact on asymmetric synthesis and pharmaceutical development; limitations include bypassing conformational searches and focusing on single-chiral-center molecules, suggesting directions to handle conformational ensembles and multi-center chirality in future work.
Abstract
Chiral molecule assignation is crucial for asymmetric catalysis, functional materials, and the drug industry. The conventional approach requires theoretical calculations of electronic circular dichroism (ECD) spectra, which is time-consuming and costly. To speed up this process, we have incorporated deep learning techniques for the ECD prediction. We first set up a large-scale dataset of Chiral Molecular ECD spectra (CMCDS) with calculated ECD spectra. We further develop the ECDFormer model, a Transformer-based model to learn the chiral molecular representations and predict corresponding ECD spectra with improved efficiency and accuracy. Unlike other models for spectrum prediction, our ECDFormer creatively focused on peak properties rather than the whole spectrum sequence for prediction, inspired by the scenario of chiral molecule assignation. Specifically, ECDFormer predicts the peak properties, including number, position, and symbol, then renders the ECD spectra from these peak properties, which significantly outperforms other models in ECD prediction, Our ECDFormer reduces the time of acquiring ECD spectra from 1-100 hours per molecule to 1.5s.
