Hierarchical Ranking Neural Network for Long Document Readability Assessment
Yurui Zheng, Yijun Chen, Shaohong Zhang
TL;DR
This work tackles long-document readability assessment by addressing length and ordinal-label challenges with a hierarchical, context-aware framework. It introduces HHNN-MDEM, a three-layer architecture that combines explicit linguistic features with multidimensional context weights, Bi-LSTM, and Inter-section R-Transformer to predict sentence- and document-level readability in a bidirectional, semi-supervised setting. A forward-text component (DSDR) and a pairwise Ranking Model exploit sentence-level supervision and ordinal relationships to boost accuracy and ordinal alignment across English and Chinese datasets, with strong gains on several corpora. The results demonstrate the value of integrating explicit features, hierarchical modeling, and ranking-based learning for robust, fine-grained readability assessment applicable to long-form multilingual texts.
Abstract
Readability assessment aims to evaluate the reading difficulty of a text. In recent years, while deep learning technology has been gradually applied to readability assessment, most approaches fail to consider either the length of the text or the ordinal relationship of readability labels. This paper proposes a bidirectional readability assessment mechanism that captures contextual information to identify regions with rich semantic information in the text, thereby predicting the readability level of individual sentences. These sentence-level labels are then used to assist in predicting the overall readability level of the document. Additionally, a pairwise sorting algorithm is introduced to model the ordinal relationship between readability levels through label subtraction. Experimental results on Chinese and English datasets demonstrate that the proposed model achieves competitive performance and outperforms other baseline models.
