A Metric for the Balance of Information in Graph Learning
Alex O. Davies, Nirav S. Ajmeri, Telmo de Menezes e Silva Filho
TL;DR
This paper addresses the problem of determining whether graph learning on molecules primarily uses structural information or features. It introduces Noise-Noise Ratio Difference (NNRD), a metric computed by applying independent noise to structure and features and measuring the resulting degradation in performance, summarized as a single score with $NNRD = log((1/|T|) sum_t h_X(t)/h_E(t))$. The authors validate NNRD on Open Graph Benchmark molecular datasets using a 3-layer GIN and noise across ten levels, showing that NNRD aligns with intuitive information balance and can reveal biases that simple performance aggregates miss. They discuss limitations such as model-dependence and outlier datasets, and suggest reporting NNRD for fixed models to guide dataset design and learning strategy. Overall, NNRD provides an interpretable, domain-agnostic tool for quantifying the balance of information sources in graph learning.
Abstract
Graph learning on molecules makes use of information from both the molecular structure and the features attached to that structure. Much work has been conducted on biasing either towards structure or features, with the aim that bias bolsters performance. Identifying which information source a dataset favours, and therefore how to approach learning that dataset, is an open issue. Here we propose Noise-Noise Ratio Difference (NNRD), a quantitative metric for whether there is more useful information in structure or features. By employing iterative noising on features and structure independently, leaving the other intact, NNRD measures the degradation of information in each. We employ NNRD over a range of molecular tasks, and show that it corresponds well to a loss of information, with intuitive results that are more expressive than simple performance aggregates. Our future work will focus on expanding data domains, tasks and types, as well as refining our choice of baseline model.
