Table of Contents
Fetching ...

Interpretability of Graph Neural Networks to Assess Effects of Global Change Drivers on Ecological Networks

Emre Anakok, Pierre Barbillon, Colin Fontaine, Elisa Thebault

TL;DR

This paper tackles how global change drivers influence plant–pollinator networks and addresses the interpretability of graph neural networks (GNNs) in this ecological context. It develops and applies a bipartite variational graph auto-encoder (BVGAE) and its fair variant to Spipoll data, enabling connectivity prediction and covariate attribution while accounting for sampling bias. Through extensive simulations and a real-data application, the study shows that attribution methods can detect single-effect covariates and their sign, but struggles when covariate effects interact with plant genera or are inflated by observer bias. The findings emphasize the potential of BVGAE-based interpretability to uncover how land use and climate covariates shape pollination network connectivity, while highlighting the need for cautious interpretation with citizen-science datasets and complex interaction effects.

Abstract

Pollinators play a crucial role for plant reproduction, either in natural ecosystem or in human-modified landscape. Global change drivers,including climate change or land use modifications, can alter the plant-pollinator interactions. To assess the potential influence of global change drivers on pollination, large-scale interactions, climate and land use data are required. While recent machine learning methods, such as graph neural networks (GNNs), allow the analysis of such datasets, interpreting their results can be challenging. We explore existing methods for interpreting GNNs in order to highlight the effects of various environmental covariates on pollination network connectivity. An extensive simulation study is performed to confirm whether these methods can detect the interactive effect between a covariate and a genus of plant on connectivity, and whether the application of debiasing techniques influences the estimation of these effects. An application on the Spipoll dataset, with and without accounting for sampling effects, highlights the potential impact of land use on network connectivity and shows that accounting for sampling effects partially alters the estimation of these effects.

Interpretability of Graph Neural Networks to Assess Effects of Global Change Drivers on Ecological Networks

TL;DR

This paper tackles how global change drivers influence plant–pollinator networks and addresses the interpretability of graph neural networks (GNNs) in this ecological context. It develops and applies a bipartite variational graph auto-encoder (BVGAE) and its fair variant to Spipoll data, enabling connectivity prediction and covariate attribution while accounting for sampling bias. Through extensive simulations and a real-data application, the study shows that attribution methods can detect single-effect covariates and their sign, but struggles when covariate effects interact with plant genera or are inflated by observer bias. The findings emphasize the potential of BVGAE-based interpretability to uncover how land use and climate covariates shape pollination network connectivity, while highlighting the need for cautious interpretation with citizen-science datasets and complex interaction effects.

Abstract

Pollinators play a crucial role for plant reproduction, either in natural ecosystem or in human-modified landscape. Global change drivers,including climate change or land use modifications, can alter the plant-pollinator interactions. To assess the potential influence of global change drivers on pollination, large-scale interactions, climate and land use data are required. While recent machine learning methods, such as graph neural networks (GNNs), allow the analysis of such datasets, interpreting their results can be challenging. We explore existing methods for interpreting GNNs in order to highlight the effects of various environmental covariates on pollination network connectivity. An extensive simulation study is performed to confirm whether these methods can detect the interactive effect between a covariate and a genus of plant on connectivity, and whether the application of debiasing techniques influences the estimation of these effects. An application on the Spipoll dataset, with and without accounting for sampling effects, highlights the potential impact of land use on network connectivity and shows that accounting for sampling effects partially alters the estimation of these effects.

Paper Structure

This paper contains 26 sections, 16 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Summary of the model used for the training of the Spipoll data set.
  • Figure 2: Insight into the behaviour we aim to detect with attribution scores. BVGAE is trained on a simulated bipartite network using features $X_1$. The resulting node embeddings are shown with different colors for the two sets of nodes. The true feature contributions to the expected connectivity $f_{\widehat{B'}}(X_1)$ are indicated on the left as "Positive contribution", "Negative contribution" and "No contribution". By perturbing each feature with a constant $\delta$, we observe how both the resulting embeddings and the expected connectivity change. The simulation settings are properly described in \ref{['sec:simstu']}.
  • Figure 3: Estimated feature importance in Simulation 1.A. for a single run. The dashed line is positioned at zero. The black dot represents the estimated score for $\mathbf{1}_{n_1}$. The green (resp. red) dots represent the estimated score for features where positive (resp. negative) values were expected. The blue dots are scores attributed to noise.
  • Figure 4: Estimated feature importance in Simulation 1.D for a single run. The dashed line is positioned at zero. The black dot represents the estimated score for $\mathbf{1}_{n_1}$. The green (resp. red) dots represent the estimated score for features where positive (resp. negative) values were expected. The blue dots are scores attributed to noise. Variables penalized by the HSIC are represented with a cross.
  • Figure 5: Estimated feature importance in Simulation 1.C for a single run. For all plot, each row represents a group, while each column represents a feature. The top six graphics display the estimated scores for all features. Features to the right of the red lines are noise. The bottom six graphics are zoomed-in sections of the left portion of the top six graphics. For each cell, the border frame represents the expected value, while the interior represents the estimated value. The black frames represent the estimated score for $\mathbf{1}_{n_1}$, green (resp. red) frames represent the score for features where positive (resp. negative) values were expected, and the blue frames are scores attributed to noise. The sign "+" or "-" denotes the sign of the estimated score within each cell.
  • ...and 1 more figures