What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics
Jordan J. Bird
TL;DR
The study tackles the lack of scalable tools for evaluating and mapping literary texts to UK Key Stages in education. It proposes a novel multimodal framework that fuses transformer-based text classification with extensive linguistic feature analysis, delivered via a stakeholder-facing web application. The approach leverages data from public-domain texts, computes a rich set of linguistic features, and searches across 500 neural architectures to optimize the linguistic classifier, with late fusion yielding state-of-the-art results (ELECTRA+ANN F1 = 0.996). This work provides a practical, data-driven toolkit for educators to assess text complexity and curriculum alignment in real time, with a public dataset and web interface that facilitate broader adoption and impact.
Abstract
The integration of new literature into the English curriculum remains a challenge since educators often lack scalable tools to rapidly evaluate readability and adapt texts for diverse classroom needs. This study proposes to address this gap through a multimodal approach that combines transformer-based text classification with linguistic feature analysis to align texts with UK Key Stages. Eight state-of-the-art Transformers were fine-tuned on segmented text data, with BERT achieving the highest unimodal F1 score of 0.75. In parallel, 500 deep neural network topologies were searched for the classification of linguistic characteristics, achieving an F1 score of 0.392. The fusion of these modalities shows a significant improvement, with every multimodal approach outperforming all unimodal models. In particular, the ELECTRA Transformer fused with the neural network achieved an F1 score of 0.996. Unimodal and multimodal approaches are shown to have statistically significant differences in all validation metrics (accuracy, precision, recall, F1 score) except for inference time. The proposed approach is finally encapsulated in a stakeholder-facing web application, providing non-technical stakeholder access to real-time insights on text complexity, reading difficulty, curriculum alignment, and recommendations for learning age range. The application empowers data-driven decision making and reduces manual workload by integrating AI-based recommendations into lesson planning for English literature.
