Table of Contents
Fetching ...

PropNet: a White-Box and Human-Like Network for Sentence Representation

Fei Yang

TL;DR

PropNet tackles the interpretability gap of transformer-based sentence embeddings by introducing a white-box, proposition-based network that encodes sentence meaning through a six-level hierarchical structure built from parsed propositions. It combines splitting, parsing, representing, and merging steps to construct interpretable networks using Dependency Grammar, with a formal framework to compare PropNets via difference vectors and CART-based prediction. Although PropNet currently lags behind state-of-the-art embedding models on semantic textual similarity tasks, it yields valuable cognitive insights into how humans evaluate sentence similarity and demonstrates potential for multimodal extension, long-text understanding, and cognitive-process analysis. The work lays a foundation for transparent linguistic representations that can illuminate human reasoning in NLP benchmarks and beyond.

Abstract

Transformer-based embedding methods have dominated the field of sentence representation in recent years. Although they have achieved remarkable performance on NLP missions, such as semantic textual similarity (STS) tasks, their black-box nature and large-data-driven training style have raised concerns, including issues related to bias, trust, and safety. Many efforts have been made to improve the interpretability of embedding models, but these problems have not been fundamentally resolved. To achieve inherent interpretability, we propose a purely white-box and human-like sentence representation network, PropNet. Inspired by findings from cognitive science, PropNet constructs a hierarchical network based on the propositions contained in a sentence. While experiments indicate that PropNet has a significant gap compared to state-of-the-art (SOTA) embedding models in STS tasks, case studies reveal substantial room for improvement. Additionally, PropNet enables us to analyze and understand the human cognitive processes underlying STS benchmarks.

PropNet: a White-Box and Human-Like Network for Sentence Representation

TL;DR

PropNet tackles the interpretability gap of transformer-based sentence embeddings by introducing a white-box, proposition-based network that encodes sentence meaning through a six-level hierarchical structure built from parsed propositions. It combines splitting, parsing, representing, and merging steps to construct interpretable networks using Dependency Grammar, with a formal framework to compare PropNets via difference vectors and CART-based prediction. Although PropNet currently lags behind state-of-the-art embedding models on semantic textual similarity tasks, it yields valuable cognitive insights into how humans evaluate sentence similarity and demonstrates potential for multimodal extension, long-text understanding, and cognitive-process analysis. The work lays a foundation for transparent linguistic representations that can illuminate human reasoning in NLP benchmarks and beyond.

Abstract

Transformer-based embedding methods have dominated the field of sentence representation in recent years. Although they have achieved remarkable performance on NLP missions, such as semantic textual similarity (STS) tasks, their black-box nature and large-data-driven training style have raised concerns, including issues related to bias, trust, and safety. Many efforts have been made to improve the interpretability of embedding models, but these problems have not been fundamentally resolved. To achieve inherent interpretability, we propose a purely white-box and human-like sentence representation network, PropNet. Inspired by findings from cognitive science, PropNet constructs a hierarchical network based on the propositions contained in a sentence. While experiments indicate that PropNet has a significant gap compared to state-of-the-art (SOTA) embedding models in STS tasks, case studies reveal substantial room for improvement. Additionally, PropNet enables us to analyze and understand the human cognitive processes underlying STS benchmarks.

Paper Structure

This paper contains 33 sections, 10 figures, 15 tables.

Figures (10)

  • Figure 1: The framework of building PropNet. The type of the input sentence is calculated before splitting. After merging, all networks are integrated into one network as the PropNet of the input sentence. Note that for types Prop0 and Prop1 the splitting and merging phases are omitted.
  • Figure 2: Splitting a Prop3+ sentence and merging its proposition representations by backtracking. At each splitting step, the method for splitting Prop2 is called. The extracted proposition invokes the representation component to build a hierarchical network. During the upward traversal, the same strategies for merging P2---denoted by green, grey, and purple arrows---are employed to integrate all networks into a unified network. Note that identifier_advcl is a placeholder for an adverbial clause.
  • Figure 3: (a) Six-level hierarchical structure of PropNet. (b) An example of representing the sentence "A young man is playing an instrument in the garden". Evolutionary nodes, developmental nodes, instance nodes and stamp nodes are labeled as red, orange, green and grey colors respectively.
  • Figure 4: Framework of the comparison module. This module consists of two parts: difference vector computation and CART prediction. It is exemplified by the pair "The tall man is playing the delicate piano" and "The short man is playing the delicate guitar". They differ at #action|#subject|#attr and #action|#object, which are marked with elements 2 and 1 at the corresponding positions in the difference vector. The codes 0, 1, 2 represent "identical", "similar" and "different", respectively. Note that the difference vector for P1- is padded with 0, doubling its size, before it enters the corresponding CART model.
  • Figure 5: (a) Mean Ground Scores. For main-captions, disparities in #action, #subject or #object tend to make the two sentences appear more dissimilar than #where and #other. Additionally, a difference in #action, #subject or #object is a sufficient condition for the non-equivalence of two propositions in human perception of sentence similarity. (b) Standard Deviation.
  • ...and 5 more figures