Table of Contents
Fetching ...

Combining Transformers with Natural Language Explanations

Federico Ruggeri, Marco Lippi, Paolo Torroni

TL;DR

The paper tackles transformer interpretability by augmenting models with an external memory of natural language explanations that ground predictions. It introduces memBERT and memDistilBERT, employing sampling strategies and optional strong supervision to scale memory usage while preserving performance. Through experiments on unfairness detection in ToS clauses and claim detection in IBM2015, the approach yields meaningful explanations and often improved metrics, demonstrating that interpretability can come with tangible predictive gains. The work highlights practical pathways for scalable, NL-grounded explanations in NLP and outlines future directions for input-aware memory retrieval and generation tasks.

Abstract

Many NLP applications require models to be interpretable. However, many successful neural architectures, including transformers, still lack effective interpretation methods. A possible solution could rely on building explanations from domain knowledge, which is often available as plain, natural language text. We thus propose an extension to transformer models that makes use of external memories to store natural language explanations and use them to explain classification outputs. We conduct an experimental evaluation on two domains, legal text analysis and argument mining, to show that our approach can produce relevant explanations while retaining or even improving classification performance.

Combining Transformers with Natural Language Explanations

TL;DR

The paper tackles transformer interpretability by augmenting models with an external memory of natural language explanations that ground predictions. It introduces memBERT and memDistilBERT, employing sampling strategies and optional strong supervision to scale memory usage while preserving performance. Through experiments on unfairness detection in ToS clauses and claim detection in IBM2015, the approach yields meaningful explanations and often improved metrics, demonstrating that interpretability can come with tangible predictive gains. The work highlights practical pathways for scalable, NL-grounded explanations in NLP and outlines future directions for input-aware memory retrieval and generation tasks.

Abstract

Many NLP applications require models to be interpretable. However, many successful neural architectures, including transformers, still lack effective interpretation methods. A possible solution could rely on building explanations from domain knowledge, which is often available as plain, natural language text. We thus propose an extension to transformer models that makes use of external memories to store natural language explanations and use them to explain classification outputs. We conduct an experimental evaluation on two domains, legal text analysis and argument mining, to show that our approach can produce relevant explanations while retaining or even improving classification performance.

Paper Structure

This paper contains 23 sections, 7 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: Information flow in informed machine learning. From von_Rueden_2021
  • Figure 2: Memory-augmented transformer architecture. $M$ denotes the whole memory, while $\bar{M}$ is a memory subset after a possible sampling step. ${\vert} \cdot {\vert}$ denotes cardinality.
  • Figure 3: MemDistilBERT analysis on IBM2015, 1-Topic. (a) P@K for increasing $K$ values and $\delta = 0.25$; (b) P@3 for increasing $\delta$ values. Metrics for sampling-based models are averaged across three distinct inferences on test set.