Towards Environment-Sensitive Molecular Inference via Mixed Integer Linear Programming
Jianshen Zhu, Mao Takekida, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu
TL;DR
The paper addresses the limitation of traditional QSAR/QSPR by introducing an environment-sensitive framework that integrates multiple molecules and environmental factors into a single descriptor and optimization paradigm. It extends the mol-infer approach with a two-phase MILP design to enable both accurate prediction and exact inverse design for environment-dependent properties, using the Flory-Huggins $χ$-parameter as a case study. Key contributions include a concatenated, environment-aware feature function, Phase 1/Phase 2 procedures, and an extended framework capable of inferring polymers with up to 50 non-hydrogen atoms, plus graph-enumeration-based candidate expansion; results on χ-data sets show competitive learning performance, and comparisons with J-OCTA demonstrate high-quality polymer inferences. This work advances molecular design under specified environmental conditions, with potential impact on polymer science and materials informatics where inter-molecular interactions and environmental factors critically shape properties.
Abstract
Traditional QSAR/QSPR and inverse QSAR/QSPR methods often assume that chemical properties are dictated by single molecules, overlooking the influence of molecular interactions and environmental factors. In this paper, we introduce a novel QSAR/QSPR framework that can capture the combined effects of multiple molecules (e.g., small molecules or polymers) and experimental conditions on property values. We design a feature function to integrate the information of multiple molecules and the environment. Specifically, for the property Flory-Huggins $χ$-parameter, which characterizes the thermodynamic properties between the solute and the solvent, and varies in temperatures, we demonstrate through computational experimental results that our approach can achieve a competitively high learning performance compared to existing works on predicting $χ$-parameter values, while inferring the solute polymers with up to 50 non-hydrogen atoms in their monomer forms in a relatively short time. A comparison study with the simulation software J-OCTA demonstrates that the polymers inferred by our methods are of high quality.
