T- Hop: A framework for studying the importance path information in molecular graphs for chemical property prediction
Abdulrahman Ibraheem, Narsis Kiani, Jesper Tegner
TL;DR
This work investigates whether incorporating path information in molecular graphs improves QSAR predictions. The authors introduce T-Hop, a GNN-like framework with two modes: a non-degenerate mode that leverages path information via a tensor-based path representation $T^L$ and a learnable matrix $M$, and a degenerate mode that relies only on the adjacency $A$. Empirical results on six MoleculeNet datasets reveal that the benefit of path information is dataset-dependent, with the degenerate variant sometimes outperforming state-of-the-art methods despite using only 2-D information. They also demonstrate a first-step approach to predict upfront whether path information will help, using 36 graph-derived features per dataset, achieving 66.7% accuracy on a small test set. Overall, the paper highlights both the value and the cost of path information, offering a practical path to selectively apply it and showing that simple models can rival more complex ones in 2-D settings.
Abstract
This paper studies the usefulness of incorporating path information in predicting chemical properties from molecular graphs, in the domain of QSAR (Quantitative Structure-Activity Relationship). Towards this, we developed a GNN-style model which can be toggled to operate in one of two modes: a non-degenerate mode which incorporates path information, and a degenerate mode which leaves out path information. Thus, by comparing the performance of the non-degenerate mode versus the degenerate mode on relevant QSAR datasets, we were able to directly assess the significance of path information on those datasets. Our results corroborate previous works, by suggesting that the usefulness of path information is datasetdependent. Unlike previous studies however, we took the very first steps towards building a model that could predict upfront whether or not path information would be useful for a given dataset at hand. Moreover, we also found that, albeit its simplicity, the degenerate mode of our model yielded rather surprising results, which outperformed more sophisticated SOTA models in certain cases.
