Table of Contents
Fetching ...

What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics

Jordan J. Bird

TL;DR

The study tackles the lack of scalable tools for evaluating and mapping literary texts to UK Key Stages in education. It proposes a novel multimodal framework that fuses transformer-based text classification with extensive linguistic feature analysis, delivered via a stakeholder-facing web application. The approach leverages data from public-domain texts, computes a rich set of linguistic features, and searches across 500 neural architectures to optimize the linguistic classifier, with late fusion yielding state-of-the-art results (ELECTRA+ANN F1 = 0.996). This work provides a practical, data-driven toolkit for educators to assess text complexity and curriculum alignment in real time, with a public dataset and web interface that facilitate broader adoption and impact.

Abstract

The integration of new literature into the English curriculum remains a challenge since educators often lack scalable tools to rapidly evaluate readability and adapt texts for diverse classroom needs. This study proposes to address this gap through a multimodal approach that combines transformer-based text classification with linguistic feature analysis to align texts with UK Key Stages. Eight state-of-the-art Transformers were fine-tuned on segmented text data, with BERT achieving the highest unimodal F1 score of 0.75. In parallel, 500 deep neural network topologies were searched for the classification of linguistic characteristics, achieving an F1 score of 0.392. The fusion of these modalities shows a significant improvement, with every multimodal approach outperforming all unimodal models. In particular, the ELECTRA Transformer fused with the neural network achieved an F1 score of 0.996. Unimodal and multimodal approaches are shown to have statistically significant differences in all validation metrics (accuracy, precision, recall, F1 score) except for inference time. The proposed approach is finally encapsulated in a stakeholder-facing web application, providing non-technical stakeholder access to real-time insights on text complexity, reading difficulty, curriculum alignment, and recommendations for learning age range. The application empowers data-driven decision making and reduces manual workload by integrating AI-based recommendations into lesson planning for English literature.

What Differentiates Educational Literature? A Multimodal Fusion Approach of Transformers and Computational Linguistics

TL;DR

The study tackles the lack of scalable tools for evaluating and mapping literary texts to UK Key Stages in education. It proposes a novel multimodal framework that fuses transformer-based text classification with extensive linguistic feature analysis, delivered via a stakeholder-facing web application. The approach leverages data from public-domain texts, computes a rich set of linguistic features, and searches across 500 neural architectures to optimize the linguistic classifier, with late fusion yielding state-of-the-art results (ELECTRA+ANN F1 = 0.996). This work provides a practical, data-driven toolkit for educators to assess text complexity and curriculum alignment in real time, with a public dataset and web interface that facilitate broader adoption and impact.

Abstract

The integration of new literature into the English curriculum remains a challenge since educators often lack scalable tools to rapidly evaluate readability and adapt texts for diverse classroom needs. This study proposes to address this gap through a multimodal approach that combines transformer-based text classification with linguistic feature analysis to align texts with UK Key Stages. Eight state-of-the-art Transformers were fine-tuned on segmented text data, with BERT achieving the highest unimodal F1 score of 0.75. In parallel, 500 deep neural network topologies were searched for the classification of linguistic characteristics, achieving an F1 score of 0.392. The fusion of these modalities shows a significant improvement, with every multimodal approach outperforming all unimodal models. In particular, the ELECTRA Transformer fused with the neural network achieved an F1 score of 0.996. Unimodal and multimodal approaches are shown to have statistically significant differences in all validation metrics (accuracy, precision, recall, F1 score) except for inference time. The proposed approach is finally encapsulated in a stakeholder-facing web application, providing non-technical stakeholder access to real-time insights on text complexity, reading difficulty, curriculum alignment, and recommendations for learning age range. The application empowers data-driven decision making and reduces manual workload by integrating AI-based recommendations into lesson planning for English literature.

Paper Structure

This paper contains 11 sections, 16 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Overview of the pre-balanced dataset given Lexile scores and UK Key Stage categorisation.
  • Figure 2: General diagram of the data generation and model training approaches followed in this study.
  • Figure 3: Flow Diagram for the web application which enables educators to utilise the machine learning and computational linguistics approaches.
  • Figure 4: The container for educators to input text and run inference. Options include free text input, file upload, or demonstration excerpts.
  • Figure 5: A visualisation provided to educators of the overall distribution of UK key stages detected in the provided text. Hovering the cursor over each bar provides a granular measure.
  • ...and 8 more figures