PropNet: a White-Box and Human-Like Network for Sentence Representation
Fei Yang
TL;DR
PropNet tackles the interpretability gap of transformer-based sentence embeddings by introducing a white-box, proposition-based network that encodes sentence meaning through a six-level hierarchical structure built from parsed propositions. It combines splitting, parsing, representing, and merging steps to construct interpretable networks using Dependency Grammar, with a formal framework to compare PropNets via difference vectors and CART-based prediction. Although PropNet currently lags behind state-of-the-art embedding models on semantic textual similarity tasks, it yields valuable cognitive insights into how humans evaluate sentence similarity and demonstrates potential for multimodal extension, long-text understanding, and cognitive-process analysis. The work lays a foundation for transparent linguistic representations that can illuminate human reasoning in NLP benchmarks and beyond.
Abstract
Transformer-based embedding methods have dominated the field of sentence representation in recent years. Although they have achieved remarkable performance on NLP missions, such as semantic textual similarity (STS) tasks, their black-box nature and large-data-driven training style have raised concerns, including issues related to bias, trust, and safety. Many efforts have been made to improve the interpretability of embedding models, but these problems have not been fundamentally resolved. To achieve inherent interpretability, we propose a purely white-box and human-like sentence representation network, PropNet. Inspired by findings from cognitive science, PropNet constructs a hierarchical network based on the propositions contained in a sentence. While experiments indicate that PropNet has a significant gap compared to state-of-the-art (SOTA) embedding models in STS tasks, case studies reveal substantial room for improvement. Additionally, PropNet enables us to analyze and understand the human cognitive processes underlying STS benchmarks.
