Cycle-Configuration: A Novel Graph-theoretic Descriptor Set for Molecular Inference
Bowen Song, Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu
TL;DR
The paper presents cycle-configurations (CC), a novel graph descriptor that augments the standard two-layered (2L) mol-infer framework to differentiate ortho/meta/para patterns around cycles. CC is integrated into a 2L+CC model with a corresponding MILP formulation, enabling both improved ML predictions across 27 properties and practical inverse design of molecular graphs with up to 50 non-hydrogen atoms. Empirical results show CC descriptors yield better or comparable performance to 2L on many datasets and that the MILP can infer feasible chemical graphs within minutes. This work broadens the applicability of MILP-based molecular inference and sets the stage for extensions to polymers and multi-objective designs.
Abstract
In this paper, we propose a novel family of descriptors of chemical graphs, named cycle-configuration (CC), that can be used in the standard "two-layered (2L) model" of mol-infer, a molecular inference framework based on mixed integer linear programming (MILP) and machine learning (ML). Proposed descriptors capture the notion of ortho/meta/para patterns that appear in aromatic rings, which has been impossible in the framework so far. Computational experiments show that, when the new descriptors are supplied, we can construct prediction functions of similar or better performance for all of the 27 tested chemical properties. We also provide an MILP formulation that asks for a chemical graph with desired properties under the 2L model with CC descriptors (2L+CC model). We show that a chemical graph with up to 50 non-hydrogen vertices can be inferred in a practical time.
