Ensemble Model With Bert,Roberta and Xlnet For Molecular property prediction
Junling Hu
TL;DR
The paper tackles molecular property prediction under resource constraints by proposing an AIS-based representation and an ensemble of Transformer models (BERT, RoBERTa, XLNet) fine-tuned from random initialization. The AIS tokenization provides richer molecular encoding, fed to a BiLSTM base predictor and a BaggingRegressor meta-learner. On Zinc250k and Zinc310k, the approach achieves state-of-the-art MAE and R² for properties such as QED, logP, and MolWt, often outperforming strong baselines like ASVAE, GROVER, CHEM-BERT, and D-MPNN. Ablation studies show AIS inputs generally improve performance and that BERT-based variants are particularly effective. This work demonstrates that high-accuracy molecular property prediction is feasible without large-scale pretraining, enabling efficient deployment in resource-limited settings with significant practical impact for chemical discovery.
Abstract
This paper presents a novel approach for predicting molecular properties with high accuracy without the need for extensive pre-training. Employing ensemble learning and supervised fine-tuning of BERT, RoBERTa, and XLNet, our method demonstrates significant effectiveness compared to existing advanced models. Crucially, it addresses the issue of limited computational resources faced by experimental groups, enabling them to accurately predict molecular properties. This innovation provides a cost-effective and resource-efficient solution, potentially advancing further research in the molecular domain.
