Optimizing Readability Using Genetic Algorithms

Jorge Martinez-Gil

Optimizing Readability Using Genetic Algorithms

Jorge Martinez-Gil

TL;DR

This work tackles the problem of automatically adjusting text readability without distorting meaning or form. It introduces ORUGA, a genetic-algorithm-based framework that substitutes candidate words with synonyms to optimize readability metrics such as $FKGL$, $SMOG$, $DCRF$, and $ARI$, while preserving content via multi-objective optimization. The paper advances the state-of-the-art by (i) offering automatic readability optimization, (ii) enabling control over the number of word substitutions, and (iii) adding semantic-distance constraints using Word Mover’s Distance with word embeddings, including extensions to NSGA-II for Pareto-front decision making. Empirical studies across diverse texts demonstrate consistent readability improvements and illustrate trade-offs between readability, form preservation, and semantic fidelity. The work provides open-source code and outlines future directions, notably incorporating contextual embeddings to further reduce drift while maintaining unsupervised operation.

Abstract

This research presents ORUGA, a method that tries to automatically optimize the readability of any text in English. The core idea behind the method is that certain factors affect the readability of a text, some of which are quantifiable (number of words, syllables, presence or absence of adverbs, and so on). The nature of these factors allows us to implement a genetic learning strategy to replace some existing words with their most suitable synonyms to facilitate optimization. In addition, this research seeks to preserve both the original text's content and form through multi-objective optimization techniques. In this way, neither the text's syntactic structure nor the semantic content of the original message is significantly distorted. An exhaustive study on a substantial number and diversity of texts confirms that our method was able to optimize the degree of readability in all cases without significantly altering their form or meaning. The source code of this approach is available at https://github.com/jorge-martinez-gil/oruga.

Optimizing Readability Using Genetic Algorithms

TL;DR

, and

, while preserving content via multi-objective optimization. The paper advances the state-of-the-art by (i) offering automatic readability optimization, (ii) enabling control over the number of word substitutions, and (iii) adding semantic-distance constraints using Word Mover’s Distance with word embeddings, including extensions to NSGA-II for Pareto-front decision making. Empirical studies across diverse texts demonstrate consistent readability improvements and illustrate trade-offs between readability, form preservation, and semantic fidelity. The work provides open-source code and outlines future directions, notably incorporating contextual embeddings to further reduce drift while maintaining unsupervised operation.

Abstract

Paper Structure (36 sections, 7 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 36 sections, 7 equations, 5 figures, 3 tables, 2 algorithms.

Introduction
State-of-the-art
Text readability in the scientific literature
Why is text readability important?
Readability metrics
Dale-Chall readability
SMOG readability
ARI readability
Flesch Kincaid readability
Semantic Similarity
Contribution over the state-of-the-art
Part I: Design and Implementation of a Functional Solution
Technical Preliminaries
Implementation
Illustrative examples
...and 21 more sections

Figures (5)

Figure 1: Results for the minimization of the FKGL score using WordNet
Figure 2: Results for the minimization of the FKGL score using word2vec
Figure 3: Results for the minimization of the FKGL score using Web Scraping
Figure 4: Non-dominated solutions for ten use cases obtained using NSGA-II
Figure 5: Summary of the results obtained for the third (and final) version of ORUGA

Optimizing Readability Using Genetic Algorithms

TL;DR

Abstract

Optimizing Readability Using Genetic Algorithms

Authors

TL;DR

Abstract

Table of Contents

Figures (5)