From Text to Graph: Leveraging Graph Neural Networks for Enhanced Explainability in NLP
Fabio Yáñez-Romero, Andrés Montoyo, Armando Suárez, Yoan Gutiérrez, Ruslan Mitkov
TL;DR
This work addresses the explainability gap in large transformer-based NLP models by translating sentences into graph-structured representations that preserve semantic meaning. A complete pipeline converts text to constituency-tree graphs, converts those trees to directed graphs, fine-tunes a language model, and distills its behavior into a graph neural network for explainable predictions, augmented by post-hoc analyses. Using SubgraphX and AutoGOAL, the approach yields high-fidelity, sparse explanations that pinpoint critical textual components and provide semantic and structural interpretations, enabling model diagnostics and knowledge extraction. The methodology demonstrates promising results on AG News and SST-2, with the GNN closely mirroring LM classifications and offering a tractable, interpretable alternative for explainability and downstream knowledge-base construction.
Abstract
Researchers have relegated natural language processing tasks to Transformer-type models, particularly generative models, because these models exhibit high versatility when performing generation and classification tasks. As the size of these models increases, they achieve outstanding results. Given their widespread use, many explainability techniques are developed based on these models. However, this process becomes computationally expensive due to the large size of the models. Additionally, transformers interpret input information through tokens that fragment input words into sequences lacking inherent semantic meaning, complicating the explanation of the model from the very beginning. This study proposes a novel methodology to achieve explainability in natural language processing tasks by automatically converting sentences into graphs and maintaining semantics through nodes and relations that express fundamental linguistic concepts. It also allows the subsequent exploitation of this knowledge in subsequent tasks, making it possible to obtain trends and understand how the model associates the different elements inside the text with the explained task. The experiments delivered promising results in determining the most critical components within the text structure for a given classification.
