Table of Contents
Fetching ...

Exploring the Influence of Relevant Knowledge for Natural Language Generation Interpretability

Iván Martínez-Murillo, Paloma Moreda, Elena Lloret

TL;DR

The paper tackles how external knowledge influences interpretability in natural language generation, focusing on commonsense tasks. It introduces KITGI, a benchmark that pairs concept sets with retrieved ConceptNet relations and includes manually annotated outputs to study reasoning in generation using the T5-Large model. A three-stage interpretability framework analyzes the impact of removing key knowledge, regenerating outputs, and manually evaluating commonsense plausibility and concept coverage. Empirical results show a dramatic drop from $91\%$ to $6\%$ in performance when relevant external knowledge is filtered, underscoring the critical role of knowledge for coherent, comprehensive NLG and motivating interpretable evaluation frameworks beyond surface metrics.

Abstract

This paper explores the influence of external knowledge integration in Natural Language Generation (NLG), focusing on a commonsense generation task. We extend the CommonGen dataset by creating KITGI, a benchmark that pairs input concept sets with retrieved semantic relations from ConceptNet and includes manually annotated outputs. Using the T5-Large model, we compare sentence generation under two conditions: with full external knowledge and with filtered knowledge where highly relevant relations were deliberately removed. Our interpretability benchmark follows a three-stage method: (1) identifying and removing key knowledge, (2) regenerating sentences, and (3) manually assessing outputs for commonsense plausibility and concept coverage. Results show that sentences generated with full knowledge achieved 91\% correctness across both criteria, while filtering reduced performance drastically to 6\%. These findings demonstrate that relevant external knowledge is critical for maintaining both coherence and concept coverage in NLG. This work highlights the importance of designing interpretable, knowledge-enhanced NLG systems and calls for evaluation frameworks that capture the underlying reasoning beyond surface-level metrics.

Exploring the Influence of Relevant Knowledge for Natural Language Generation Interpretability

TL;DR

The paper tackles how external knowledge influences interpretability in natural language generation, focusing on commonsense tasks. It introduces KITGI, a benchmark that pairs concept sets with retrieved ConceptNet relations and includes manually annotated outputs to study reasoning in generation using the T5-Large model. A three-stage interpretability framework analyzes the impact of removing key knowledge, regenerating outputs, and manually evaluating commonsense plausibility and concept coverage. Empirical results show a dramatic drop from to in performance when relevant external knowledge is filtered, underscoring the critical role of knowledge for coherent, comprehensive NLG and motivating interpretable evaluation frameworks beyond surface metrics.

Abstract

This paper explores the influence of external knowledge integration in Natural Language Generation (NLG), focusing on a commonsense generation task. We extend the CommonGen dataset by creating KITGI, a benchmark that pairs input concept sets with retrieved semantic relations from ConceptNet and includes manually annotated outputs. Using the T5-Large model, we compare sentence generation under two conditions: with full external knowledge and with filtered knowledge where highly relevant relations were deliberately removed. Our interpretability benchmark follows a three-stage method: (1) identifying and removing key knowledge, (2) regenerating sentences, and (3) manually assessing outputs for commonsense plausibility and concept coverage. Results show that sentences generated with full knowledge achieved 91\% correctness across both criteria, while filtering reduced performance drastically to 6\%. These findings demonstrate that relevant external knowledge is critical for maintaining both coherence and concept coverage in NLG. This work highlights the importance of designing interpretable, knowledge-enhanced NLG systems and calls for evaluation frameworks that capture the underlying reasoning beyond surface-level metrics.

Paper Structure

This paper contains 9 sections, 4 figures.

Figures (4)

  • Figure 1: Samples from the crafted dataset.
  • Figure 2: Comparison of relation type distributions.
  • Figure 3: Representative samples of the criteria applied during evaluation.
  • Figure 4: Manual analysis results.