Table of Contents
Fetching ...

Adaptive Node Feature Selection For Graph Neural Networks

Ali Azizpour, Madeline Navarro, Santiago Segarra

TL;DR

This work tackles interpretability and efficiency in graph neural networks by introducing permutation-based node feature importance (NPT) and an adaptive feature selection algorithm that prunes uninformative features during training. The method theoretically links feature and graph structure to GCN performance, and empirically demonstrates that NPT provides meaningful importance scores across diverse graph datasets with varying homophily, while ANFS maintains or closely matches full-feature accuracy with fewer attributes. The approach supports model- and task-agnostic applicability, enables dynamic monitoring of feature relevance, and offers practical benefits for reducing dimensionality without sacrificing predictive power. Overall, it advances explainability and computational efficiency in GNNs, with potential extensions to other graph tasks and future work on more robust permutation schemes.

Abstract

We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions, reducing dimensionality, and even improving performance by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may not be amenable to classical feature importance metrics. Inspired by this challenge, we present a model- and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our intervention-based approach by characterizing how GNN performance depends on the relationships between node data and graph structure. Not only do we return feature importance scores once training concludes, we also track how relevance evolves as features are successively dropped. We can therefore monitor if features are eliminated effectively and also evaluate other metrics with this technique. Our empirical results verify the flexibility of our approach to different graph architectures as well as its adaptability to more challenging graph learning settings.

Adaptive Node Feature Selection For Graph Neural Networks

TL;DR

This work tackles interpretability and efficiency in graph neural networks by introducing permutation-based node feature importance (NPT) and an adaptive feature selection algorithm that prunes uninformative features during training. The method theoretically links feature and graph structure to GCN performance, and empirically demonstrates that NPT provides meaningful importance scores across diverse graph datasets with varying homophily, while ANFS maintains or closely matches full-feature accuracy with fewer attributes. The approach supports model- and task-agnostic applicability, enables dynamic monitoring of feature relevance, and offers practical benefits for reducing dimensionality without sacrificing predictive power. Overall, it advances explainability and computational efficiency in GNNs, with potential extensions to other graph tasks and future work on more robust permutation schemes.

Abstract

We propose an adaptive node feature selection approach for graph neural networks (GNNs) that identifies and removes unnecessary features during training. The ability to measure how features contribute to model output is key for interpreting decisions, reducing dimensionality, and even improving performance by eliminating unhelpful variables. However, graph-structured data introduces complex dependencies that may not be amenable to classical feature importance metrics. Inspired by this challenge, we present a model- and task-agnostic method that determines relevant features during training based on changes in validation performance upon permuting feature values. We theoretically motivate our intervention-based approach by characterizing how GNN performance depends on the relationships between node data and graph structure. Not only do we return feature importance scores once training concludes, we also track how relevance evolves as features are successively dropped. We can therefore monitor if features are eliminated effectively and also evaluate other metrics with this technique. Our empirical results verify the flexibility of our approach to different graph architectures as well as its adaptability to more challenging graph learning settings.

Paper Structure

This paper contains 19 sections, 28 equations, 11 figures, 4 tables, 1 algorithm.

Figures (11)

  • Figure 1: Example graphs for which graph structure can alter how node features affect node classification. Class labels are denoted by "0" or "1". Node features are represented by color, where red and blue indicate features from different distributions, and brightness indicates different magnitudes. (a) Edges directly imply similarity of node labels and features. (b) While most connected nodes belong to the same class, edges also tend to indicate distribution shifts in node features. (c) Both node labels and features are homophilic, but the high variance of node feature distributions may render classification more challenging.
  • Figure 2: Node classification accuracy during training for a GCN and Cora using Algorithm \ref{['alg:nfpt']} with different feature importance metrics. (a) Validation accuracy comparing a model trained using all features versus NPT, TFI, and MI. (b) Test accuracy comparing a model trained using all features versus NPT, TFI, and MI. (c) The difference in test accuracy between the full model and the model trained with Algorithm \ref{['alg:nfpt']}.
  • Figure 3: Heatmaps of feature importance scores $\delta_m$ (high $\delta_m$ is red and low $\delta_m$ is blue). Checkpoints denote every $50$ epochs, where $\delta_m$ is computed to determine importance to determine importance scores for the current model. The $y$-axis sorts bins of features by their relevance according the final checkpoint. (a) Feature importance for a GCN trained on Cora. (b) Feature importance for a GCN trained on PubMed.
  • Figure 4: Node classification accuracy during training for a GCN and CiteSeer using Algorithm \ref{['alg:nfpt']} with different feature importance metrics. (a) Validation accuracy comparing a model trained using all features versus NPT, TFI, and MI. (b) Test accuracy comparing a model trained using all features versus NPT, TFI, and MI. (c) The difference in test accuracy between the full model and the model trained with Algorithm \ref{['alg:nfpt']}.
  • Figure 5: Node classification accuracy during training for a GCN and PubMed using Algorithm \ref{['alg:nfpt']} with different feature importance metrics. (a) Validation accuracy comparing a model trained using all features versus NPT, TFI, and MI. (b) Test accuracy comparing a model trained using all features versus NPT, TFI, and MI. (c) The difference in test accuracy between the full model and the model trained with Algorithm \ref{['alg:nfpt']}.
  • ...and 6 more figures