Table of Contents
Fetching ...

Towards Environment-Sensitive Molecular Inference via Mixed Integer Linear Programming

Jianshen Zhu, Mao Takekida, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu

TL;DR

The paper addresses the limitation of traditional QSAR/QSPR by introducing an environment-sensitive framework that integrates multiple molecules and environmental factors into a single descriptor and optimization paradigm. It extends the mol-infer approach with a two-phase MILP design to enable both accurate prediction and exact inverse design for environment-dependent properties, using the Flory-Huggins $χ$-parameter as a case study. Key contributions include a concatenated, environment-aware feature function, Phase 1/Phase 2 procedures, and an extended framework capable of inferring polymers with up to 50 non-hydrogen atoms, plus graph-enumeration-based candidate expansion; results on χ-data sets show competitive learning performance, and comparisons with J-OCTA demonstrate high-quality polymer inferences. This work advances molecular design under specified environmental conditions, with potential impact on polymer science and materials informatics where inter-molecular interactions and environmental factors critically shape properties.

Abstract

Traditional QSAR/QSPR and inverse QSAR/QSPR methods often assume that chemical properties are dictated by single molecules, overlooking the influence of molecular interactions and environmental factors. In this paper, we introduce a novel QSAR/QSPR framework that can capture the combined effects of multiple molecules (e.g., small molecules or polymers) and experimental conditions on property values. We design a feature function to integrate the information of multiple molecules and the environment. Specifically, for the property Flory-Huggins $χ$-parameter, which characterizes the thermodynamic properties between the solute and the solvent, and varies in temperatures, we demonstrate through computational experimental results that our approach can achieve a competitively high learning performance compared to existing works on predicting $χ$-parameter values, while inferring the solute polymers with up to 50 non-hydrogen atoms in their monomer forms in a relatively short time. A comparison study with the simulation software J-OCTA demonstrates that the polymers inferred by our methods are of high quality.

Towards Environment-Sensitive Molecular Inference via Mixed Integer Linear Programming

TL;DR

The paper addresses the limitation of traditional QSAR/QSPR by introducing an environment-sensitive framework that integrates multiple molecules and environmental factors into a single descriptor and optimization paradigm. It extends the mol-infer approach with a two-phase MILP design to enable both accurate prediction and exact inverse design for environment-dependent properties, using the Flory-Huggins -parameter as a case study. Key contributions include a concatenated, environment-aware feature function, Phase 1/Phase 2 procedures, and an extended framework capable of inferring polymers with up to 50 non-hydrogen atoms, plus graph-enumeration-based candidate expansion; results on χ-data sets show competitive learning performance, and comparisons with J-OCTA demonstrate high-quality polymer inferences. This work advances molecular design under specified environmental conditions, with potential impact on polymer science and materials informatics where inter-molecular interactions and environmental factors critically shape properties.

Abstract

Traditional QSAR/QSPR and inverse QSAR/QSPR methods often assume that chemical properties are dictated by single molecules, overlooking the influence of molecular interactions and environmental factors. In this paper, we introduce a novel QSAR/QSPR framework that can capture the combined effects of multiple molecules (e.g., small molecules or polymers) and experimental conditions on property values. We design a feature function to integrate the information of multiple molecules and the environment. Specifically, for the property Flory-Huggins -parameter, which characterizes the thermodynamic properties between the solute and the solvent, and varies in temperatures, we demonstrate through computational experimental results that our approach can achieve a competitively high learning performance compared to existing works on predicting -parameter values, while inferring the solute polymers with up to 50 non-hydrogen atoms in their monomer forms in a relatively short time. A comparison study with the simulation software J-OCTA demonstrates that the polymers inferred by our methods are of high quality.

Paper Structure

This paper contains 20 sections, 6 equations, 16 figures, 8 tables.

Figures (16)

  • Figure 1: An illustration of the two-phase framework mol-infer.
  • Figure 2: An illustration of the two-layered model. The interior region is represented by the shaded area enclosed by black dashed lines, while the remaining parts form the exterior. $T_u$ is the chemical tree rooted at $u$ and is outlined by a thin gray line.
  • Figure 3: (i) A seed graph $G_{\mathrm{C}}$ for $I_a$; (ii) A set $\mathcal{F}$ of chemical rooted trees. The figure is adapted from Ido:2024aa.
  • Figure 4: (a) The repeating unit of the polymer thioBis(4-phenyl)carbonate, where $v^*_1$ and $v^*_2$ are the connecting-vertices and $e^*_0$ and $e^*_1$ are the connecting-edges; (b) The monomer representation of the polymer in (a), where $v^*_1$ and $v^*_2$ are the connecting-vertices and the link-edges are depicted with thick lines. The figure is adapted from Ido:2024aa.
  • Figure 5: An illustration of Phase 1 of the extended framework.
  • ...and 11 more figures