Table of Contents
Fetching ...

Optimizing Readability Using Genetic Algorithms

Jorge Martinez-Gil

TL;DR

This work tackles the problem of automatically adjusting text readability without distorting meaning or form. It introduces ORUGA, a genetic-algorithm-based framework that substitutes candidate words with synonyms to optimize readability metrics such as $FKGL$, $SMOG$, $DCRF$, and $ARI$, while preserving content via multi-objective optimization. The paper advances the state-of-the-art by (i) offering automatic readability optimization, (ii) enabling control over the number of word substitutions, and (iii) adding semantic-distance constraints using Word Mover’s Distance with word embeddings, including extensions to NSGA-II for Pareto-front decision making. Empirical studies across diverse texts demonstrate consistent readability improvements and illustrate trade-offs between readability, form preservation, and semantic fidelity. The work provides open-source code and outlines future directions, notably incorporating contextual embeddings to further reduce drift while maintaining unsupervised operation.

Abstract

This research presents ORUGA, a method that tries to automatically optimize the readability of any text in English. The core idea behind the method is that certain factors affect the readability of a text, some of which are quantifiable (number of words, syllables, presence or absence of adverbs, and so on). The nature of these factors allows us to implement a genetic learning strategy to replace some existing words with their most suitable synonyms to facilitate optimization. In addition, this research seeks to preserve both the original text's content and form through multi-objective optimization techniques. In this way, neither the text's syntactic structure nor the semantic content of the original message is significantly distorted. An exhaustive study on a substantial number and diversity of texts confirms that our method was able to optimize the degree of readability in all cases without significantly altering their form or meaning. The source code of this approach is available at https://github.com/jorge-martinez-gil/oruga.

Optimizing Readability Using Genetic Algorithms

TL;DR

This work tackles the problem of automatically adjusting text readability without distorting meaning or form. It introduces ORUGA, a genetic-algorithm-based framework that substitutes candidate words with synonyms to optimize readability metrics such as , , , and , while preserving content via multi-objective optimization. The paper advances the state-of-the-art by (i) offering automatic readability optimization, (ii) enabling control over the number of word substitutions, and (iii) adding semantic-distance constraints using Word Mover’s Distance with word embeddings, including extensions to NSGA-II for Pareto-front decision making. Empirical studies across diverse texts demonstrate consistent readability improvements and illustrate trade-offs between readability, form preservation, and semantic fidelity. The work provides open-source code and outlines future directions, notably incorporating contextual embeddings to further reduce drift while maintaining unsupervised operation.

Abstract

This research presents ORUGA, a method that tries to automatically optimize the readability of any text in English. The core idea behind the method is that certain factors affect the readability of a text, some of which are quantifiable (number of words, syllables, presence or absence of adverbs, and so on). The nature of these factors allows us to implement a genetic learning strategy to replace some existing words with their most suitable synonyms to facilitate optimization. In addition, this research seeks to preserve both the original text's content and form through multi-objective optimization techniques. In this way, neither the text's syntactic structure nor the semantic content of the original message is significantly distorted. An exhaustive study on a substantial number and diversity of texts confirms that our method was able to optimize the degree of readability in all cases without significantly altering their form or meaning. The source code of this approach is available at https://github.com/jorge-martinez-gil/oruga.
Paper Structure (36 sections, 7 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 36 sections, 7 equations, 5 figures, 3 tables, 2 algorithms.

Figures (5)

  • Figure 1: Results for the minimization of the FKGL score using WordNet
  • Figure 2: Results for the minimization of the FKGL score using word2vec
  • Figure 3: Results for the minimization of the FKGL score using Web Scraping
  • Figure 4: Non-dominated solutions for ten use cases obtained using NSGA-II
  • Figure 5: Summary of the results obtained for the third (and final) version of ORUGA