Table of Contents
Fetching ...

The Importance of Model Inspection for Better Understanding Performance Characteristics of Graph Neural Networks

Nairouz Shehata, Carolina Piçarra, Anees Kazi, Ben Glocker

TL;DR

The paper addresses the risk that final-test accuracy alone hides biases and biases in feature learning when using graph neural networks on brain-shape data. It proposes and applies a model-inspection framework that extracts embeddings from the GCN submodel and classifier layers across two architecture variants (shared vs. structure-specific subgraphs) and with/without mesh registration. The authors show that while ROC-AUC differences are modest, the learned feature spaces reveal data-source encoding and task-relevant separability that depend on architectural choices and preprocessing steps, underscoring the need for inspection beyond accuracy. This approach improves understanding of what drives predictions, informs model selection, and has practical implications for transfer learning and domain adaptation in biomedical imaging.

Abstract

This study highlights the importance of conducting comprehensive model inspection as part of comparative performance analyses. Here, we investigate the effect of modelling choices on the feature learning characteristics of graph neural networks applied to a brain shape classification task. Specifically, we analyse the effect of using parameter-efficient, shared graph convolutional submodels compared to structure-specific, non-shared submodels. Further, we assess the effect of mesh registration as part of the data harmonisation pipeline. We find substantial differences in the feature embeddings at different layers of the models. Our results highlight that test accuracy alone is insufficient to identify important model characteristics such as encoded biases related to data source or potentially non-discriminative features learned in submodels. Our model inspection framework offers a valuable tool for practitioners to better understand performance characteristics of deep learning models in medical imaging.

The Importance of Model Inspection for Better Understanding Performance Characteristics of Graph Neural Networks

TL;DR

The paper addresses the risk that final-test accuracy alone hides biases and biases in feature learning when using graph neural networks on brain-shape data. It proposes and applies a model-inspection framework that extracts embeddings from the GCN submodel and classifier layers across two architecture variants (shared vs. structure-specific subgraphs) and with/without mesh registration. The authors show that while ROC-AUC differences are modest, the learned feature spaces reveal data-source encoding and task-relevant separability that depend on architectural choices and preprocessing steps, underscoring the need for inspection beyond accuracy. This approach improves understanding of what drives predictions, informs model selection, and has practical implications for transfer learning and domain adaptation in biomedical imaging.

Abstract

This study highlights the importance of conducting comprehensive model inspection as part of comparative performance analyses. Here, we investigate the effect of modelling choices on the feature learning characteristics of graph neural networks applied to a brain shape classification task. Specifically, we analyse the effect of using parameter-efficient, shared graph convolutional submodels compared to structure-specific, non-shared submodels. Further, we assess the effect of mesh registration as part of the data harmonisation pipeline. We find substantial differences in the feature embeddings at different layers of the models. Our results highlight that test accuracy alone is insufficient to identify important model characteristics such as encoded biases related to data source or potentially non-discriminative features learned in submodels. Our model inspection framework offers a valuable tool for practitioners to better understand performance characteristics of deep learning models in medical imaging.
Paper Structure (15 sections, 3 figures)

This paper contains 15 sections, 3 figures.

Figures (3)

  • Figure 1: Model architecture consisting of a graph convolutional network (GCN) submodel feeding graph embeddings into a classification head with two fully connected layers (FC1 and FC2). Where N is the number of brain substructures, 15. For our model inspection, we read out the feature vectors from the GCN submodel, FC1, and FC2.
  • Figure 2: Sex classification performance for four models; (a)shared and (b)non-shared submodel without mesh registration, (c)shared and (d)non-shared submodel with mesh registration. We observe that the generalisation gap between the in-distribution test data (UKBB) and the external test data (CamCAN, IXI, OASIS3) closes with mesh registration. Overall, there are only small differences in performance, illustrating that test accuracy alone is insufficient to identify variations in model characteristics.
  • Figure 3: Effect of modelling choices on feature separability for four different models at their the GCN layer (left), first fully connected layer FC1 (middle), and output layer FC2 (right). Models: (a,c)shared and (b,d)non-shared GCN submodel, and (a,b)without and (c,d)with mesh registration. For each model, we show the separation by target label in the top row, and the separation by dataset in the bottom row. Effect of submodel: The models in (a,c) with a shared submodel are unable to learn discriminative GCN features for the prediction task, while the models in (b,d) with a non-shared submodel show much better task-related separability in the GCN features. Effect of registration: The models models in (a,b)without registration strongly encode information about the data source in the GCN layer. This is much reduced for the models in (c,d)with mesh registration.