Table of Contents
Fetching ...

Inclusive Easy-to-Read Generation for Individuals with Cognitive Impairments

François Ledoyen, Gaël Dias, Alexis Lechervy, Jeremie Pantin, Fabrice Maurel, Youssef Chahir, Elisa Gouzonnat, Mélanie Berthelot, Stanislas Moravac, Armony Altinier, Amy Khairalla

TL;DR

This work introduces ETR-fr, the first expert-transcribed French dataset aligned to European Easy-to-Read guidelines, to address cognitive accessibility for individuals with impairments. It combines an expert-centric two-step pipeline (BARThez for summarization and MUSS for simplification) with parameter-efficient fine-tuning techniques (Prefix-tuning and LoRA) on PLMs and LLMs, evaluated via automatic metrics and human assessment. Key findings show that small models with LoRA/Prefix-tuning can match or surpass full fine-tuning of larger models, achieving strong ROUGE, BERTScore, KMRE, and SARI scores, while providing better generalization, especially in out-of-domain political texts. The paper highlights limitations of current automatic metrics for ETR, the need for more robust cross-lingual data, and potential improvements through RLHF and multilingual ETR resources to broaden accessibility and adherence to guidelines.

Abstract

Ensuring accessibility for individuals with cognitive impairments is essential for autonomy, self-determination, and full citizenship. However, manual Easy-to-Read (ETR) text adaptations are slow, costly, and difficult to scale, limiting access to crucial information in healthcare, education, and civic life. AI-driven ETR generation offers a scalable solution but faces key challenges, including dataset scarcity, domain adaptation, and balancing lightweight learning of Large Language Models (LLMs). In this paper, we introduce ETR-fr, the first dataset for ETR text generation fully compliant with European ETR guidelines. We implement parameter-efficient fine-tuning on PLMs and LLMs to establish generative baselines. To ensure high-quality and accessible outputs, we introduce an evaluation framework based on automatic metrics supplemented by human assessments. The latter is conducted using a 36-question evaluation form that is aligned with the guidelines. Overall results show that PLMs perform comparably to LLMs and adapt effectively to out-of-domain texts.

Inclusive Easy-to-Read Generation for Individuals with Cognitive Impairments

TL;DR

This work introduces ETR-fr, the first expert-transcribed French dataset aligned to European Easy-to-Read guidelines, to address cognitive accessibility for individuals with impairments. It combines an expert-centric two-step pipeline (BARThez for summarization and MUSS for simplification) with parameter-efficient fine-tuning techniques (Prefix-tuning and LoRA) on PLMs and LLMs, evaluated via automatic metrics and human assessment. Key findings show that small models with LoRA/Prefix-tuning can match or surpass full fine-tuning of larger models, achieving strong ROUGE, BERTScore, KMRE, and SARI scores, while providing better generalization, especially in out-of-domain political texts. The paper highlights limitations of current automatic metrics for ETR, the need for more robust cross-lingual data, and potential improvements through RLHF and multilingual ETR resources to broaden accessibility and adherence to guidelines.

Abstract

Ensuring accessibility for individuals with cognitive impairments is essential for autonomy, self-determination, and full citizenship. However, manual Easy-to-Read (ETR) text adaptations are slow, costly, and difficult to scale, limiting access to crucial information in healthcare, education, and civic life. AI-driven ETR generation offers a scalable solution but faces key challenges, including dataset scarcity, domain adaptation, and balancing lightweight learning of Large Language Models (LLMs). In this paper, we introduce ETR-fr, the first dataset for ETR text generation fully compliant with European ETR guidelines. We implement parameter-efficient fine-tuning on PLMs and LLMs to establish generative baselines. To ensure high-quality and accessible outputs, we introduce an evaluation framework based on automatic metrics supplemented by human assessments. The latter is conducted using a 36-question evaluation form that is aligned with the guidelines. Overall results show that PLMs perform comparably to LLMs and adapt effectively to out-of-domain texts.

Paper Structure

This paper contains 26 sections, 2 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Manual evaluation comparisons. (a) Assessments from 28 ETR guidelines questions grouped into three categories. (b) Assessments from 8 text generation questions grouped into five categories.