Socrates-Mol: Self-Oriented Cognitive Reasoning through Autonomous Trial-and-Error with Empirical-Bayesian Screening for Molecules
Xiangru Wang, Zekun Jiang, Heng Yang, Cheng Tan, Xingying Lan, Chunming Xu, Tianhang Zhou
TL;DR
Socrates-Mol reframes molecular property prediction as an empirical Bayesian reasoning process guided by context engineering. By running a reflection–prediction loop across five LLMs, retrieving structurally similar cases, and aggregating posteriors through task-adaptive self-consistency, the framework tackles cold-start and data-scarcity problems, especially in regression tasks for LogP prediction of amine solvents. Regression gains are substantial, with MAE and $R^2$ improving markedly, while ranking benefits are mixed due to systematic model biases and limited evidence retrieval for boundary cases. The approach reduces deployment costs by avoiding full fine-tuning and provides a scalable, interpretable pipeline with broad potential across solvent screening and material design, highlighting the need for task-aware aggregation and richer retrieval for ranking. Future work envisions enhanced retrieval, multi-property case libraries, and integration into high-throughput screening workflows to generalize across chemistry domains.
Abstract
Molecular property prediction is fundamental to chemical engineering applications such as solvent screening. We present Socrates-Mol, a framework that transforms language models into empirical Bayesian reasoners through context engineering, addressing cold start problems without model fine-tuning. The system implements a reflective-prediction cycle where initial outputs serve as priors, retrieved molecular cases provide evidence, and refined predictions form posteriors, extracting reusable chemical rules from sparse data. We introduce ranking tasks aligned with industrial screening priorities and employ cross-model self-consistency across five language models to reduce variance. Experiments on amine solvent LogP prediction reveal task-dependent patterns: regression achieves 72% MAE reduction and 112% R-squared improvement through self-consistency, while ranking tasks show limited gains due to systematic multi-model biases. The framework reduces deployment costs by over 70% compared to full fine-tuning, providing a scalable solution for molecular property prediction while elucidating the task-adaptive nature of self-consistency mechanisms.
