Table of Contents
Fetching ...

Simpler becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?

Miriam Anschütz, Edoardo Mosca, Georg Groh

TL;DR

This paper investigates whether pre-trained classifiers preserve content-related labels when the input text is simplified. By evaluating 11 models across six simplification datasets in English, German, and Italian, and by analyzing factors such as edit distance, simplification strength, and named-entity masking, the study reveals widespread prediction-inconsistency, with rates up to 50% and evidence of zero-iteration adversarial potential. The authors extend the analysis to GPT-3.5 via one-shot prompts, finding substantial but task-dependent sensitivity to simplification. They argue that plain-language understanding remains underrepresented in pretraining data and propose collecting more plain-language corpora and applying alignment techniques (e.g., RLHF, DPO) to improve cross-model coherence, while warning of misuse risk in real-world applications.

Abstract

Text simplification seeks to improve readability while retaining the original content and meaning. Our study investigates whether pre-trained classifiers also maintain such coherence by comparing their predictions on both original and simplified inputs. We conduct experiments using 11 pre-trained models, including BERT and OpenAI's GPT 3.5, across six datasets spanning three languages. Additionally, we conduct a detailed analysis of the correlation between prediction change rates and simplification types/strengths. Our findings reveal alarming inconsistencies across all languages and models. If not promptly addressed, simplified inputs can be easily exploited to craft zero-iteration model-agnostic adversarial attacks with success rates of up to 50%

Simpler becomes Harder: Do LLMs Exhibit a Coherent Behavior on Simplified Corpora?

TL;DR

This paper investigates whether pre-trained classifiers preserve content-related labels when the input text is simplified. By evaluating 11 models across six simplification datasets in English, German, and Italian, and by analyzing factors such as edit distance, simplification strength, and named-entity masking, the study reveals widespread prediction-inconsistency, with rates up to 50% and evidence of zero-iteration adversarial potential. The authors extend the analysis to GPT-3.5 via one-shot prompts, finding substantial but task-dependent sensitivity to simplification. They argue that plain-language understanding remains underrepresented in pretraining data and propose collecting more plain-language corpora and applying alignment techniques (e.g., RLHF, DPO) to improve cross-model coherence, while warning of misuse risk in real-world applications.

Abstract

Text simplification seeks to improve readability while retaining the original content and meaning. Our study investigates whether pre-trained classifiers also maintain such coherence by comparing their predictions on both original and simplified inputs. We conduct experiments using 11 pre-trained models, including BERT and OpenAI's GPT 3.5, across six datasets spanning three languages. Additionally, we conduct a detailed analysis of the correlation between prediction change rates and simplification types/strengths. Our findings reveal alarming inconsistencies across all languages and models. If not promptly addressed, simplified inputs can be easily exploited to craft zero-iteration model-agnostic adversarial attacks with success rates of up to 50%
Paper Structure (15 sections, 8 figures, 2 tables)

This paper contains 15 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Manually created sentence pair. The simplified version simplifies the word "conference" but preserves the meaning and neutral sentiment of the original sentence. However, a pre-trained emotion classifier behaves incoherent and predicts a different label for the simplified sentence.
  • Figure 2: Prediction change rates across different languages and tasks, sorted by the simplification strength of the samples. All classifiers show more deviating predictions the stronger the simplification strength. Overall, the English models are least coherent.
  • Figure 3: Number of classifiers with changing predictions per sample and their Levenshtein distances between the original and simplified sentences. The distances were normalized by the sample's lengths.
  • Figure 4: Predictions change rates for tasks with reduced number of classes. The reduced tasks have a better performance but are still susceptible to simplifications.
  • Figure 5: Prediction change rate of different simplification operations as annotated in the Italian Simpitiki corpus tonelli-simpitiki.
  • ...and 3 more figures